7. X-Symbol Internals
This section is outdated, it currently describes Version 3.4.2 of X-Symbol.
Package X-Symbol is distributed in two ways. End-users should use the binary package which contains pre-compiled files. X-Symbol developers should use the source package which contains some additional files.
7.1 Internal Representation of X-Symbol Characters How X-Symbol represents X-Symbol chars. 7.2 Defining X-Symbol Charsets How X-Symbol defines additional chars. 7.3 Defining Input Methods How X-Symbol defines the input methods. 7.4 Extending Package X-Symbol How to add fonts and token languages. 7.5 Various Internals How X-Symbol handles other aspects. 7.6 Design Alternatives Why X-Symbol is not designed differently. 7.7 Language Internals How X-Symbol handles languages. 7.8 Miscellaneous Internals Various. TODO.
7.1 Internal Representation of X-Symbol Characters
As mentioned in 6.1 Pseudo Token Language "x-symbol charsym", most functions do not operate
on X-Symbol characters directly, they use "x-symbol charsyms". These
charsyms have a symbol property x-symbol-cstring
which points to
a string, called cstring, containing the X-Symbol character.
-
Under Emacs and XEmacs/Mule, the string only contains the character
which is a normal Mule character created by
make-char
. -
Under XEmacs/no-Mule, the string only contains the 8bit character if the
X-Symbol character is a 8bit character according to
x-symbol-default-coding
(see section 3.2.1 Normal File and Default Encoding). Otherwise, the string contains of a leading character (with range `\200' to `\237') and an octet. Packagefont-lock
is used to display them correctly as X-Symbol characters (see section 8.4.3 The Buffer Contains Strange Characters). E.g., with `\251' iscopyright
, we get(get 'Idotaccent 'x-symbol-cstring) => "\235\251"
If the character is also a 8bit character in some encoding (see section 3.2.2 File Coding of 8bit Characters), the charsym also has the symbol property
x-symbol-file-cstrings
for the representation in the file and
property x-symbol-buffer-cstrings
to recognize character aliases
(see section 3.2.7 Character Aliases). E.g., under XEmacs/no-Mule, with `\335' is
Yacute
, `\251' is copyright
, we get
(get 'Idotaccent 'x-symbol-file-cstrings) => (iso-8859-9 "\335" iso-8859-3 "\251") (get 'Idotaccent 'x-symbol-buffer-cstrings) => (iso-8859-9 "\234\335" iso-8859-3 "\235\251") |
The values are plists (see section `Property Lists' in XEmacs Lisp Reference Manual) mapping the file coding to the strings in the file or the buffer, respectively.
After token languages have been initialized, the charsym also has the
symbol properties x-symbol-tokens
(see section 3.1 Token Language) and
x-symbol-classes
(see section 3.6 Character Group and Token Classes):
(get 'Idotaccent 'x-symbol-tokens) => (sgml "İ" tex "{\\.I}") (get 'Idotaccent 'x-symbol-classes) => (sgml (non-l1) tex (text aletter)) |
7.2 Defining X-Symbol Charsets
An X-Symbol charset, called cset in the code and the docstrings, handles one font used by package X-Symbol. Each cset must use the same char registry-encoding as the corresponding variables for the fonts (see section 2.9 Lisp Coding when Using Other Fonts).
You have to tell X-Symbol, how to define Mule charsets with Emacs or XEmacs/Mule and which leading character to use with XEmacs/no-Mule. As an example, we use the definition of the Adobe symbol font.
(defvar x-symbol-xsymb0-cset '((("adobe-fontspecific") ?\233 -3600) (xsymb0-left "X-Symbol characters 0, left" 94 ?:) . (xsymb0-right "X-Symbol characters 0, right" 94 ?\;))) |
Mule charsets (see section `Charsets' in XEmacs Lisp Reference Manual) may be used for 94 or 96 characters (this example: 94, only charset with dimension 1 can be defined with X-Symbol). Thus, if your font provides more characters, you are likely to use both the left and the right half of the font to define two Mule charsets. For both of them, you have to define a unique, free final character/byte of the standard ISO 2022 escape sequence designating the charset (this example: `:' and `;'). The remaining free (reserved by Emacs for users) are `>' and `?', the latter is already used in XEmacs.
For XEmacs/no-Mule, you have to define the leading character (this example: `\233').
x-symbol-latin1-cset
x-symbol-latin2-cset
x-symbol-latin3-cset
x-symbol-latin5-cset
- Cset definitions only using the upper halves of the fonts where the corresponding Mule charsets are known and which define characters which are considered 8bit characters in the corresponding encoding, see 3.2.2 File Coding of 8bit Characters.
x-symbol-xsymb0-cset
x-symbol-xsymb1-cset
- Cset definitions using both halves of the fonts where no corresponding Mule charset are yet known.
7.3 Defining Input Methods
This is probably the hardest section in this manual....
7.3.1 Defining Input Methods: Objectives Input methods should be intuitive/consistent. 7.3.2 X-Symbol Character Descriptions: Example An example introducing char descriptions. 7.3.3 Defining Input Methods by Character Descriptions The aspects and the contexts of a character. 7.3.4 Defining Input Methods: Example A complete example defining input methods. 7.3.5 Customizing Input Methods How to customize the input methods.
7.3.1 Defining Input Methods: Objectives
Input methods should be intuitive. This requires consistency:
-
Characters should be found under the same header in the Grid and in the
Menu.
-
If one character can be modified or rotated to another character
(see section 4.7 Input Method Context: Replace Char Sequence), both should stand near to each other in
the Grid. E.g., since
arrowsouthwest
rotates toarrowdown
, they stand next to each other. -
The key binding should be similar to the context of input method
Context. If two characters are defined to have the same context, they
should have the same key prefix and the suffix should be a number which
increases with the "modify-to" behavior. E.g.,
reflexsubset
with key binding C-= < _ 2 modifies toreflexsqsubset
with key binding C-= < _ 3. -
Consistent definition of "modify-to" and "rotate-to": if A can be
modified to B and rotated to C and C can be modified to D, B can be
rotated to D in most cases.
-
It should be possible to load character definitions later on, e.g., when
new token languages get initialized.
-
Existing key bindings should not be overwritten. If some of them have to
change, it should be done in a uniform way (solution: key suffix
`1').
- Also, modifying or rotating a new character to/from old ones should be possible without changing the input definitions of the old characters.
-
Existing key bindings should not be overwritten. If some of them have to
change, it should be done in a uniform way (solution: key suffix
`1').
Observation: It is impossible, especially with the possibility to load
character definitions later on, to define the input methods directly,
i.e., by something like define-key
. The solution is an indirect
definitions with "character descriptions".
7.3.2 X-Symbol Character Descriptions: Example
As an example for "character descriptions", look at the definition of
longarrowright
in x-symbol-xsymb1-table
(`95' is the
encoding in the font and not of interest here). Some terms are defined
in the next section:
(longarrowright 95 (arrow) (size big . arrowright) nil ("->" t "-->") (emdash)) |
With this definition, package X-Symbol automatically defines:
-
Key bindings C-= - - > and C-= - > 2, the latter has suffix
2, because C-= - > is also "wanted" by
arrowright
which now has the key binding C-= - > 1 (the "score" oflongarrowright
is higher, due to `size big'). See section 4.6 Input Method Keyboard: Compose Key Sequence. -
arrowright
modifies tolongarrowright
, which modifies toarrowright
. See section 4.7 Input Method Context: Replace Char Sequence. -
longarrowleft
rotates tolongarrowright
, which rotates tolongarrowboth
(which rotates tolongarrowleft
). (The "rotate aspects" are inherited fromarrowright
.) See section 4.7 Input Method Context: Replace Char Sequence. -
The following contexts can be modified to
longarrowright
: `-->' orminus1
/endash
/macron
/emdash
/hyphen
and `->' (since all define context `-') andemdash
and `>' (sinceemdash
defines context `--'). `->' is used forarrowright
, which has a lower score, see above. See section 4.7 Input Method Context: Replace Char Sequence. -
Input method Electric will change context `-->' (is tagged with
t
in the definition) tolongarrowright
, alsoemdash
and `>' (only theoretically, since input method Electric will produceemdash
only in TeX's text mode, andlongarrowright
only in TeX's math mode). See section 4.8 Input Method Electric: Automatic Context. -
The character will appear in the Grid under the header `Arrow'.
You will probably recognize that the placement is based on the modify-to
and rotate-to behavior above. See section 4.5 Input Method Grid: Choose Highlighted Character.
- The character will appear in the Menu under one of the headers `Arrow n'". The submenus are sorted alphabetically. See section 4.4 Input Method Menu: Select a Menu Item.
Consider that this character would be missing in package X-Symbol and you want to define your own character (in your own font). With the current scheme, the one line above is enough! Have fun defining all the consequences directly instead....
7.3.3 Defining Input Methods by Character Descriptions
Characters are defined with character descriptions which consist of different aspects and contexts, which can also be inherited from a parent character. All characters which are connected with parents, form a component. Aspects and contexts are used to determine the modify-to and rotate-to chain for characters, the contexts for input method Context and Electric, the key bindings, and the position in the Menu and the Grid.
If you want to check the component, scores, etc of a specific character,
look at the symbol property (e.g., with M-x hyper-apropos-get-doc)
of the corresponding charsym, e.g., arrowright
. See also the
docstrings of x-symbol-init-cset
and x-symbol-init-input
.
Remember, all characters which are connected with parents, form a component. Contexts are the contexts of input method Context (see section 4.7 Input Method Context: Replace Char Sequence). If a table entry of a charsym does not define its own contexts, they are the same as the contexts of the charsym in an earlier position in the modify chain (see below), or the contexts of the first charsym with defined contexts in the modify chain. The modify context of a charsym is the first context.
x-symbol-rotate-aspects-alist
-
Characters in the same component whose aspects only differ by their
direction
(east
,...), a key in this alist, are circularly connected by "rotate-to". The sequence in the rotate chain is determined by rotate scores depending on the values in the rotate aspects. Charsyms with the same "rotate-aspects" are not connected (charsyms with the smallest modify scores are preferred).(get 'longarrowright 'x-symbol-rotate-aspects) => (-1500 direction east)
x-symbol-modify-aspects-alist
-
Characters in the same components whose aspects only differ by their
size
(big
,...),shape
(round
,square
...) and/orshift
(up
,down
,...), keys in this alist, are circularly connected by "modify-to", if all their modify contexts are used exclusively, i.e., no other modify chain uses any of them. The sequence in the modify chain is determined by modify scores depending on the values in the modify aspects, the charsym score defined in the definition tables and the score of the whole cset (see section 7.2 Defining X-Symbol Charsets).(get 'longarrowright 'x-symbol-score) => -3500 (get 'longarrowright 'x-symbol-modify-aspects) => (1500 shift nil shape nil size big)
Otherwise, the "modify chain" is divided into modify subchains, which are those charsyms sharing the same modify context. All modify subchains using the same modify context, build a horizontal chain whose charsyms are circularly connected by "modify-to".
We build a key chain for all contexts (not just modify contexts), consisting of all charsyms (sorted according to modify scores) having the context. Input method Context modifies the context to the first charsym in the key chain.
x-symbol-key-suffix-string
- If there is only one charsym in the key chain, C-= plus the context inserts the charsym. Otherwise, we determine a suffix for each charsym in the key chain by its index and this string. C-= plus the context plus the suffix inserts the charsym.
7.3.4 Defining Input Methods: Example
An example: Modify Modify Rotate Rotate Modify Other Score Aspect Score Aspect Context Contexts -------------------------------------------------------------- charsym 1w 150 nil 100 west `a' `c' charsym 2w 200 nil 100 west `b' - charsym 3w 350 big 100 west (`b') (-) charsym 1e 100 nil 200 east (`a') (`b') charsym 2e 250 big 200 east `a' `b' charsym 3e 300 big 200 east `a' - charsym 1n 100 nil 300 north `d' `c' charsym 2n 200 big 300 north `c' - |
Assuming that all charsyms form one component, we have:
Rotate chains: (1w,2w)-1e-1n and 3w-(2e,3e)-2n. Modify chains: 1w-2w-3w and 1e-2w-3w and 1n-2n. Horizontal chains: 1e-1w-2e-3e (for modify context `a') 2w-3w (for modify context `b') Key chains: 1e-1w-2e-3e (for context `a') 1e-2w-2e-3w (for context `b') 1n-1w-2n (for context `c') 1n (for context `d') |
That makes the following bindings:
Rotate-to: 1w->1e, 2w->1e, 1e->1n, 1n->1w 3w->2e, 2e->2n, 3e->2n, 2n->3w Modify-to: 1e->1w, 1w->2e, 2e->3e, 3e->1e (horizontal chain) 2w->3w, 3w->2w (horizontal chain) 1n->2n, 2n->1n (modify chain with exclusive modify contexts) CONTEXTS: `a'->1e, `b'->1e, `c'->1n, `d'->1n KEY: `a1'=1e, `a2'=1w, `a3'=2e, `a4'=3e, `b1'=1e, ..., `d'=1n |
7.3.5 Customizing Input Methods
When defining contexts for characters, you should try to use default contexts to make them and key bindings as consistent as possible. E.g., package X-Symbol only defines explicit contexts for 186 of the 437 characters.
x-symbol-group-input-alist
-
Defines default scores and bindings for characters of a group
(see section 3.6 Character Group and Token Classes). E.g., the definition (in
x-symbol-latin1-table
)(aacute 225 (acute "a" Aacute))
defines
aacute
without any explicit contexts, but having the groupacute
and the subgroup `a'. The default input for the group is defined by the following element in this variable:(acute 0 "%s'" t "'%s")
That means: 0 is added to the normal "modify-score" of the character. `%s'' and `'%s' with `%s' substituted by the subgroup, i.e., `a'' and `'a', are the contexts for
aacute
. The context `'a' is also used for input method Electric since it is prefixed byt
. x-symbol-key-min-length
- It is quite unlikely that a one-character context is not the prefix of another context, at least when loading additional font definitions. In order not to have to change key bindings C-= key to C-= key 1, it is required that the length of the key binding without C-= is at least 2.
7.4 Extending Package X-Symbol
In this section, you are told what to consider and what to do when extending package X-Symbol with new characters and new token languages. If you only want to define a token language using existing characters, you only have to read the last section.
7.4.1 Extending X-Symbol with New Fonts How to add fonts to X-Symbol. 7.4.2 Guidelines for Input Definitions Guidelines for input definitions. 7.4.3 Emacs Lisp File Defining a New Font How to define new character in a file. 7.4.4 Emacs Lisp File Extending a Token Language Extending an existing language. 7.4.5 Emacs Lisp File Defining a New Token Language Defining a new language.
7.4.1 Extending X-Symbol with New Fonts
If you add a new token language to package X-Symbol which should represent tokens by characters which are not yet defined by package X-Symbol, you have to add a new font to package X-Symbol, first.
When adding new fonts to package X-Symbol, consider that X-Symbol has to run under Emacs, XEmacs/Mule and XEmacs/no-Mule.
Running under Emacs and XEmacs/Mule requires that you cannot use all encodings in a font for characters: you should probably only use encodings 33 to 126 and 160 to 255. You should also use a unique pair of charset properties `CHARSET_REGISTRY' and `CHARSET_ENCODING'.
Running under XEmacs/no-Mule can leads to problems when major modes do not check whether the previous character is an escape character (in our case, a leading character, see section 7.1 Internal Representation of X-Symbol Characters) when looking at a character. Thus, you should probably not use encodings which represent characters in your default font with a special syntax.
-
In general, escape sequences use the digits of the current font. Thus,
you should probably define the encodings 48 to 57 as digits `0' to
`9'.
-
In LaTeX buffers, characters in `$%\{}' have a special
syntax. Thus, you should probably not use encodings 36, 37,
92, 123 and 125 for characters which could also be useful with token
languages
tex
andutex
. -
In HTML buffers, characters in `&<>' have a special syntax. Thus,
you should probably not use encodings 38, 60 and 62 for
characters which could also be useful with token language
sgml
.
You have to tell package X-Symbol which fonts to use for the normal text, subscripts and superscripts. See section 2.9 Lisp Coding when Using Other Fonts.
You have to tell X-Symbol, how to define Mule charsets with Emacs and XEmacs/Mule and which leading character to use with XEmacs/no-Mule. See section 7.2 Defining X-Symbol Charsets.
7.4.2 Guidelines for Input Definitions
Read section 7.3 Defining Input Methods. Look at the tables in `x-symbol.el'. Here are some guidelines of how to define the input methods for new characters:
-
Define reasonable character groups for new characters, see 3.6 Character Group and Token Classes. E.g., if you add the IPA font for phonetic characters, you
are likely to define at least one additional charset group. If you do
not know whether to use one or two groups for a set of characters, use
two.
-
Define under which Grid/Menu header the character of the new character
group should appear. You may also want to add additional headers for
these characters. See section 3.6 Character Group and Token Classes.
-
If reasonable, define default contexts for characters of a group, see
7.3.5 Customizing Input Methods.
-
For the other characters, define contexts by Ascii sequences which look
similar to the character.
-
Form a component for a set of characters which are strongly related to
each other. In most cases, characters of a component are in the same
group but not vice versa. E.g., the simple arrows already defined by
package X-Symbol form one component. You form a component of characters
by specifying parents in their definition, see 7.3.3 Defining Input Methods by Character Descriptions.
-
Use aspects to describe the new characters. Add new aspects to
x-symbol-modify-aspects-alist
andx-symbol-rotate-aspects-alist
if necessary (see section 7.3.3 Defining Input Methods by Character Descriptions). -
Finish the definition of your font file (see section 7.4.3 Emacs Lisp File Defining a New Font),
load it with M-x load-file, and initialize the input methods,
e.g., by invoking the grid (M-x x-symbol-grid).
-
If there are no errors, you are likely to get warnings about equal
modify scores. In this case, the sequence of characters in the
modify-to chain is random, so are the numerical suffixes of key bindings.
-
Define a base score for the whole X-Symbol charset ("cset score")
which should be a positive number in order not to change the key bindings
of previously defined X-Symbol characters.
-
Define reasonable scores for newly defined aspects and character groups.
- Finally, fine-tune your definitions by charsym scores in the tables. This should be necessary only for a few characters.
-
Define a base score for the whole X-Symbol charset ("cset score")
which should be a positive number in order not to change the key bindings
of previously defined X-Symbol characters.
7.4.3 Emacs Lisp File Defining a New Font
Now put all things together in a separate font definition file. You should not put it in a language definition file.
Here is a tiny example using only the lower half of the font:
(provide 'x-symbol-myfont) (defvar x-symbol-myfont-fonts '(("-xsymb-myfont-medium-r-normal--14-140-75-75-p-85-xsymb-myfont") ("-xsymb-myfont_sub-medium-r-normal--12-120-75-75-p-74-xsymb-myfont") ("-xsymb-myfont_sup-medium-r-normal--12-120-75-75-p-74-xsymb-myfont"))) (defvar x-symbol-myfont-cset '((("xsymb-myfont") ?\200 1000) (myfont-left "My font characters, left" 94 63) . nil)) |
(defvar x-symbol-myfont-table '((longarrownortheast 33 (arrow) (size big . arrownortheast)) (koerper 34 (setsymbol "K")) (circleS 35 (symbol "S") nil nil "SO"))) (x-symbol-init-cset x-symbol-myfont-cset x-symbol-myfont-fonts x-symbol-myfont-table) |
Due to an XEmacs bug with char syntax inherit
, you should also
add the following line to files `x-symbol-xmas20.el' and
`x-symbol-xmas21.el':
(modify-syntax-entry ?\200 "\\" (standard-syntax-table)) |
7.4.4 Emacs Lisp File Extending a Token Language
If you want to use the new font to extend an existing token language,
define a new token language which inherits most variables from the
"parent language". E.g., token language utex
inherits most
variables from tex
, see `x-symbol-utex.el'.
A language must define variables for all language aspects, see
7.7 Language Internals. Our example defines a language mytex
using the additional characters from 7.4.3 Emacs Lisp File Defining a New Font.
First, you have to register the language in a startup file:
(defvar x-symbol-mytex-name "My TeX macro") (defvar x-symbol-mytex-modes nil) (x-symbol-register-language 'mytex 'x-symbol-mytex x-symbol-mytex-modes) |
The language definition file should look like (leaving out most parts which are similar to the ones in `x-symbol-utex.el'):
(provide 'x-symbol-mytex) (require 'x-symbol-tex) (defvar x-symbol-mytex-required-fonts '(x-symbol-myfont)) (put 'mytex 'x-symbol-font-lock-keywords 'x-symbol-tex-font-lock-keywords) |
(defvar x-symbol-mytex-user-table nil) (defvar x-symbol-mytex-myfont-table '((longarrownortheast (math arrow user) "\\longnortheastarrow") (koerper (math letter user) "\\setK") (circleS (math ordinary amssymb) "\\circledS"))) (defvar x-symbol-mytex-table (append x-symbol-mytex-user-table '(nil) x-symbol-mytex-myfont-table x-symbol-tex-table)) |
It is important that you do not define a variable for the language
access x-symbol-font-lock-keywords
, but rather use the variable
of the parent language directly, see 7.7 Language Internals.
During the testing phase, you should probably leave out the `'(nil)' which prevents warnings about redefinitions for the following elements.
7.4.5 Emacs Lisp File Defining a New Token Language
You might also want to define a new token language not based on another language.
As an example, consider a token language "My Unicode" (myuc
)
for buffers with major mode myuc-mode
. Thus, we register the
language by:
(defvar x-symbol-myuc-name "My Unicode") (defvar x-symbol-myuc-modes '(myuc-mode)) (x-symbol-register-language 'myuc 'x-symbol-myuc x-symbol-myuc-modes) |
Each token if language myuc
consists of `#' plus the
hexadecimal representation of the Unicode with hexadecimal values where
the case of digits is not important and the preferred case is upcase. A
single `#' is represented by the token ##
. In order to be
more flexible, we want to define the tokens by their decimal value in
the table. There are no subscript and no images. The code below
(`x-symbol-myuc.el') is included in the source distribution of
package X-Symbol.
(provide 'x-symbol-myuc) (defvar x-symbol-myuc-required-fonts nil) (defvar x-symbol-myuc-modeline-name "myuc") (defvar x-symbol-myuc-class-alist '((VALID "My Unicode" (x-symbol-info-face)) (INVALID "no My Unicode" (red x-symbol-info-face)))) (defvar x-symbol-myuc-font-lock-keywords nil) (defvar x-symbol-myuc-image-keywords nil) ... |
(defvar x-symbol-myuc-case-insensitive 'upcase) (defvar x-symbol-myuc-token-shape '(?# "#[0-9A-Fa-f]+\\'" . "[0-9A-Fa-f]")) (defvar x-symbol-myuc-exec-specs '(nil (nil . "#[0-9A-Fa-f]+"))) (defvar x-symbol-myuc-input-token-ignore nil) |
(defun x-symbol-myuc-default-token-list (tokens) (list (format "#%X" (car tokens)))) (defvar x-symbol-myuc-token-list 'x-symbol-myuc-default-token-list) (defvar x-symbol-myuc-user-table nil) (defvar x-symbol-myuc-xsymb0-table '((alpha () 945) (beta () 946))) (defvar x-symbol-myuc-table (append x-symbol-myuc-user-table x-symbol-myuc-xsymb0-table)) ... |
7.5 Various Internals
7.5.1 Tagging Insert Commands for Token and Electric Don't break input methods Token and Electric. 7.5.2 Avoiding Hide/Show-Invisible Flickering Moving cursor in invisible commands.
7.5.1 Tagging Insert Commands for Token and Electric
Input methods Token (see section 4.2 Input Method Token: Replace Token by Character) and Electric (see section 4.8 Input Method Electric: Automatic Context) stop their auto replacement if you use a command which is not an insert command.
self-insert-command
newline
newline-and-indent
reindent-then-newline-and-indent
tex-insert-quote
TeX-insert-quote
TeX-insert-punctuation
TeX-insert-dollar
sgml-close-angle
sgml-slash
-
These commands and commands aliased to these are recognized as input
commands by having a non-
nil
value of its symbol propertyx-symbol-input
.
7.5.2 Avoiding Hide/Show-Invisible Flickering
Starting a command makes a previously revealed super- or subscript command (see section 5.1 Super- and Subscripts) invisible again. Repeatedly invoking commands which moves the point just by a small amount can lead to some flickering.
forward-char
forward-char-command
backward-char
backward-char-command
-
If the point position after the execution of these commands is still
"at" the super- or subscript command, the command won't be made
invisible at the first place. Each of these four commands have a
function (
1+
and1-
) as the value of its symbol propertyx-symbol-point-function
which returns the position "after" when called with the position "before".
7.6 Design Alternatives
This section describes potential design alternatives and why they were not used.
7.6.1 Alternative Token Representations Why we need the conversion. 7.6.2 Alternative Ways to Turn on X-Symbol Globally How to turn on X-Symbol globally. 7.6.3 Alternative Auto Conversion Methods When do we convert automatically.
7.6.1 Alternative Token Representations
Package X-Symbol represents tokens in the file by characters in the buffer. This requires an automatic conversion when visiting a file or saving a buffer, see 3.2 Conversion: Decoding and Encoding.
Another possibility would be to use the tokens directly in the buffer and just display them differently. You would need no conversion and you could copy the text easily to a message buffer. This could be done by a special face and an additional font-lock keyword for every token. The disadvantages make this approach unfeasible:
-
The editing commands would work on the tokens which are invisible for
the user.
-
Extremely resource and startup-time consuming. If as many characters
should be supported as done by package X-Symbol, including superscripts
and subscripts, more than 2000 faces with display tables would have to
be defined even without considering char aliases!
-
Time consuming. More than 2000 entries in you font-lock keywords would
slow down the fontification considerably, which would be too much even
when using
lazy-shot
!
Another possibility would be to adapt TeX to the representations of the corresponding characters in Emacs' buffer. Again, you would need no conversion. The disadvantages make this approach too restrictive:
-
You cannot adopt SGML to this approach.
-
You cannot read normal LaTeX files directly, you do not write normal
LaTeX files.
-
You would have different TeX versions: one for X-Symbol with Emacs
and XEmacs/Mule, one with XEmacs/no-Mule.
- If you are not an extremely good TeX hacker, it would be impossible to adopt this approach to support more than 256 characters.
A third alternative would be very similar to the method used in this package. There would be just a slight difference when running under XEmacs/no-Mule: the internal representation of a character is always just one character, but we would also provide font properties for characters not of your default font. The disadvantages make this approach too unsafe:
-
Problems with current search/replace commands.
-
Problems with the current version of
font-lock
(it should never overwrite the font property for this character, even if the character matches some match infont-lock-keywords
and overwrite is non-nil
). This gets even more difficult with superscripts/subscripts. -
Unless you can provide a syntax table for faces (you cannot), characters
in different faces with the same encoding are in the same syntax class,
which is irritating: e.g.,
\leftrightarrow
and\approx
would be delimiters.
7.6.2 Alternative Ways to Turn on X-Symbol Globally
This package hooks itself into hack-local-variables-hook
which
makes the installation very simple.
Another possibility would be to use the major-mode hooks which is the normal way how to turn on a minor mode. The disadvantages are:
-
The installation is more complicated.
- Local variables in files are not yet processed (this was the main reason not to do it this way).
Another possibility would be to hook X-Symbol into
find-file-hooks
, as it is done in old versions of package
X-Symbol. It would be as easy as the current approach but we would have
to be careful with sequence of functions in find-file-hooks
,
especially with the function hooked in by font-lock
.
7.6.3 Alternative Auto Conversion Methods
Without package crypt
, this package automatically decodes tokens
when turning on the minor mode (in hack-local-variables-hook
,
see section 7.6.2 Alternative Ways to Turn on X-Symbol Globally) or in after-insert-file-functions
. This
package automatically encodes characters in
write-region-annotate-functions
. The disadvantage is that the
possibility to change buffers in write-region-annotate-functions
is not official (see section 9.2.3 Wishlist: Changes in Emacs/XEmacs), i.e., not mentioned in the
docstring (only mentioned for corresponding encode-functions of package
format
which use a similar loop in the C code).
With package crypt
, this package automatically decodes tokens
when turning on the minor mode. This package automatically encodes
characters in write-file-hooks
. The disadvantage is that the
encoding is slower (use jka-compr
instead crypt
) and the
problem with vc-next-action
(see section 8.2 Spurious Encodings).
Without package crypt
, Version 2.6 of this package automatically
encoded characters in write-file-data-hooks
. The advantage was
that changing buffers there is official, the disadvantage is that it is
also more complicated.
A totally different method would be to use package format
.
Unfortunately, this is not really possible, since a regexp in
format-alist
is much too weak, i.e., X-Symbol's decoding does not
change any file headers which would represent the file format. In
XEmacs, this package also fails to work properly with jka-compr
and crypt
.
7.7 Language Internals
In order to use a token language or accessing one of the language dependent values, the following conditions must be met:
-
The language must be registered. This makes it possible to select
the language in the menus. It also prevents to load a potentially
dangerous file when a file specifies a buffer-local value of
x-symbol-language
.x-symbol-register-language
- Registering a language includes stating the name of the feature (i.e., a file) which provides the language. The name of the language must have been already defined.
-
The file providing the language must have been loaded. This will
be done automatically when the language is initialized. Customizing
X-Symbol will also load the language files.
-
The language must be initialized. This will be done automatically
if the language is used. This loads the language file and fails if the
language has not been registered. If some minor language information is
needed, e.g., in the highlight menu of the Grid (see section 4.5 Input Method Grid: Choose Highlighted Character), you should initialize the language explicitly, e.g., by the
following command:
Language dependent values are accessed by language accesses:
x-symbol-language-value
-
Returns the language depending value. Also initializes language if
necessary. E.g., we get the name of a language by the language access
x-symbol-name
. With a simplified expansion, we get(x-symbol-language-value 'x-symbol-name 'tex) ==> (symbol-value (get 'tex 'x-symbol-name)) => (symbol-value 'x-symbol-tex-name) => "TeX macro"
x-symbol-language-access-alist
-
List of all language accesses. A token language must define all
variables accessed by language accesses. A language access is a
property of the language symbol, its value is the symbol naming a
variable whose value is used.
If the language is a derived language, e.g., like language
utex
, the language accessx-symbol-font-lock-keywords
, should point directly to the variable of the parent language (heretex
), see file `x-symbol-utex.el'.
7.8 Miscellaneous Internals
TODO. This is currently just a collection of unrelated stuff.
Characters might also define a subgroup which is a string defining some order on characters in the same group (see section 3.6 Character Group and Token Classes) and is also used for default contexts/bindings (see section 7.3.5 Customizing Input Methods).
x-symbol-group-syntax-alist
- Lists all valid character groups. Under Emacs and XEmacs/Mule, this list also determines the syntax of characters.
The character group could probably also be used to define character categories if they are implemented in XEmacs.
This document was generated by Christoph Wedler on December, 8 2003 using texi2html