X-Symbol	Overview	Related	Details	Manual	News	»Project	»Download

7. X-Symbol Internals

This section is outdated, it currently describes Version 3.4.2 of X-Symbol.

Package X-Symbol is distributed in two ways. End-users should use the binary package which contains pre-compiled files. X-Symbol developers should use the source package which contains some additional files.

7.1 Internal Representation of X-Symbol Characters    How X-Symbol represents X-Symbol chars.

7.2 Defining X-Symbol Charsets    How X-Symbol defines additional chars.

7.3 Defining Input Methods    How X-Symbol defines the input methods.

7.4 Extending Package X-Symbol    How to add fonts and token languages.

7.5 Various Internals    How X-Symbol handles other aspects.

7.6 Design Alternatives    Why X-Symbol is not designed differently.

7.7 Language Internals    How X-Symbol handles languages.

7.8 Miscellaneous Internals    Various. TODO.

7.1 Internal Representation of X-Symbol Characters

As mentioned in 6.1 Pseudo Token Language "x-symbol charsym", most functions do not operate on X-Symbol characters directly, they use "x-symbol charsyms". These charsyms have a symbol property x-symbol-cstring which points to a string, called cstring, containing the X-Symbol character.

Under Emacs and XEmacs/Mule, the string only contains the character which is a normal Mule character created by make-char.
Under XEmacs/no-Mule, the string only contains the 8bit character if the X-Symbol character is a 8bit character according to x-symbol-default-coding (see section 3.2.1 Normal File and Default Encoding). Otherwise, the string contains of a leading character (with range `\200' to `\237') and an octet. Package font-lock is used to display them correctly as X-Symbol characters (see section 8.4.3 The Buffer Contains Strange Characters). E.g., with `\251' is copyright, we get
(get 'Idotaccent 'x-symbol-cstring) => "\235\251"

If the character is also a 8bit character in some encoding (see section 3.2.2 File Coding of 8bit Characters), the charsym also has the symbol property x-symbol-file-cstrings for the representation in the file and property x-symbol-buffer-cstrings to recognize character aliases (see section 3.2.7 Character Aliases). E.g., under XEmacs/no-Mule, with `\335' is Yacute, `\251' is copyright, we get

(get 'Idotaccent 'x-symbol-file-cstrings) => (iso-8859-9 "\335" iso-8859-3 "\251") (get 'Idotaccent 'x-symbol-buffer-cstrings) => (iso-8859-9 "\234\335" iso-8859-3 "\235\251")

The values are plists (see section `Property Lists' in XEmacs Lisp Reference Manual) mapping the file coding to the strings in the file or the buffer, respectively.

After token languages have been initialized, the charsym also has the symbol properties x-symbol-tokens (see section 3.1 Token Language) and x-symbol-classes (see section 3.6 Character Group and Token Classes):

(get 'Idotaccent 'x-symbol-tokens) => (sgml "İ" tex "{\\.I}") (get 'Idotaccent 'x-symbol-classes) => (sgml (non-l1) tex (text aletter))

7.2 Defining X-Symbol Charsets

An X-Symbol charset, called cset in the code and the docstrings, handles one font used by package X-Symbol. Each cset must use the same char registry-encoding as the corresponding variables for the fonts (see section 2.9 Lisp Coding when Using Other Fonts).

You have to tell X-Symbol, how to define Mule charsets with Emacs or XEmacs/Mule and which leading character to use with XEmacs/no-Mule. As an example, we use the definition of the Adobe symbol font.

(defvar x-symbol-xsymb0-cset '((("adobe-fontspecific") ?\233 -3600) (xsymb0-left "X-Symbol characters 0, left" 94 ?:) . (xsymb0-right "X-Symbol characters 0, right" 94 ?\;)))

Mule charsets (see section `Charsets' in XEmacs Lisp Reference Manual) may be used for 94 or 96 characters (this example: 94, only charset with dimension 1 can be defined with X-Symbol). Thus, if your font provides more characters, you are likely to use both the left and the right half of the font to define two Mule charsets. For both of them, you have to define a unique, free final character/byte of the standard ISO 2022 escape sequence designating the charset (this example: `:' and `;'). The remaining free (reserved by Emacs for users) are `>' and `?', the latter is already used in XEmacs.

For XEmacs/no-Mule, you have to define the leading character (this example: `\233').

x-symbol-latin1-cset
x-symbol-latin2-cset
x-symbol-latin3-cset
x-symbol-latin5-cset: Cset definitions only using the upper halves of the fonts where the corresponding Mule charsets are known and which define characters which are considered 8bit characters in the corresponding encoding, see 3.2.2 File Coding of 8bit Characters.
x-symbol-xsymb0-cset
x-symbol-xsymb1-cset: Cset definitions using both halves of the fonts where no corresponding Mule charset are yet known.

7.3 Defining Input Methods

This is probably the hardest section in this manual....

7.3.1 Defining Input Methods: Objectives    Input methods should be intuitive/consistent.

7.3.2 X-Symbol Character Descriptions: Example    An example introducing char descriptions.

7.3.3 Defining Input Methods by Character Descriptions    The aspects and the contexts of a character.

7.3.4 Defining Input Methods: Example    A complete example defining input methods.

7.3.5 Customizing Input Methods    How to customize the input methods.

7.3.1 Defining Input Methods: Objectives

Input methods should be intuitive. This requires consistency:

Characters should be found under the same header in the Grid and in the Menu.
If one character can be modified or rotated to another character (see section 4.7 Input Method Context: Replace Char Sequence), both should stand near to each other in the Grid. E.g., since arrowsouthwest rotates to arrowdown, they stand next to each other.
The key binding should be similar to the context of input method Context. If two characters are defined to have the same context, they should have the same key prefix and the suffix should be a number which increases with the "modify-to" behavior. E.g., reflexsubset with key binding C-= < _ 2 modifies to reflexsqsubset with key binding C-= < _ 3.
Consistent definition of "modify-to" and "rotate-to": if A can be modified to B and rotated to C and C can be modified to D, B can be rotated to D in most cases.
It should be possible to load character definitions later on, e.g., when new token languages get initialized.
- Existing key bindings should not be overwritten. If some of them have to change, it should be done in a uniform way (solution: key suffix `1').
- Also, modifying or rotating a new character to/from old ones should be possible without changing the input definitions of the old characters.

Observation: It is impossible, especially with the possibility to load character definitions later on, to define the input methods directly, i.e., by something like define-key. The solution is an indirect definitions with "character descriptions".

7.3.2 X-Symbol Character Descriptions: Example

As an example for "character descriptions", look at the definition of longarrowright in x-symbol-xsymb1-table (`95' is the encoding in the font and not of interest here). Some terms are defined in the next section:

(longarrowright 95 (arrow) (size big . arrowright) nil ("->" t "-->") (emdash))

With this definition, package X-Symbol automatically defines:

Key bindings C-= - - > and C-= - > 2, the latter has suffix 2, because C-= - > is also "wanted" by arrowright which now has the key binding C-= - > 1 (the "score" of longarrowright is higher, due to `size big'). See section 4.6 Input Method Keyboard: Compose Key Sequence.
arrowright modifies to longarrowright, which modifies to arrowright. See section 4.7 Input Method Context: Replace Char Sequence.
longarrowleft rotates to longarrowright, which rotates to longarrowboth (which rotates to longarrowleft). (The "rotate aspects" are inherited from arrowright.) See section 4.7 Input Method Context: Replace Char Sequence.
The following contexts can be modified to longarrowright: `-->' or minus1 / endash / macron / emdash / hyphen and `->' (since all define context `-') and emdash and `>' (since emdash defines context `--'). `->' is used for arrowright, which has a lower score, see above. See section 4.7 Input Method Context: Replace Char Sequence.
Input method Electric will change context `-->' (is tagged with t in the definition) to longarrowright, also emdash and `>' (only theoretically, since input method Electric will produce emdash only in TeX's text mode, and longarrowright only in TeX's math mode). See section 4.8 Input Method Electric: Automatic Context.
The character will appear in the Grid under the header `Arrow'. You will probably recognize that the placement is based on the modify-to and rotate-to behavior above. See section 4.5 Input Method Grid: Choose Highlighted Character.
The character will appear in the Menu under one of the headers `Arrow n'". The submenus are sorted alphabetically. See section 4.4 Input Method Menu: Select a Menu Item.

Consider that this character would be missing in package X-Symbol and you want to define your own character (in your own font). With the current scheme, the one line above is enough! Have fun defining all the consequences directly instead....

7.3.3 Defining Input Methods by Character Descriptions

Characters are defined with character descriptions which consist of different aspects and contexts, which can also be inherited from a parent character. All characters which are connected with parents, form a component. Aspects and contexts are used to determine the modify-to and rotate-to chain for characters, the contexts for input method Context and Electric, the key bindings, and the position in the Menu and the Grid.

If you want to check the component, scores, etc of a specific character, look at the symbol property (e.g., with M-x hyper-apropos-get-doc) of the corresponding charsym, e.g., arrowright. See also the docstrings of x-symbol-init-cset and x-symbol-init-input.

Remember, all characters which are connected with parents, form a component. Contexts are the contexts of input method Context (see section 4.7 Input Method Context: Replace Char Sequence). If a table entry of a charsym does not define its own contexts, they are the same as the contexts of the charsym in an earlier position in the modify chain (see below), or the contexts of the first charsym with defined contexts in the modify chain. The modify context of a charsym is the first context.

x-symbol-rotate-aspects-alist

Characters in the same component whose aspects only differ by their direction (east,...), a key in this alist, are circularly connected by "rotate-to". The sequence in the rotate chain is determined by rotate scores depending on the values in the rotate aspects. Charsyms with the same "rotate-aspects" are not connected (charsyms with the smallest modify scores are preferred).

(get 'longarrowright 'x-symbol-rotate-aspects) => (-1500 direction east)

x-symbol-modify-aspects-alist

Characters in the same components whose aspects only differ by their size (big,...), shape (round, square...) and/or shift (up, down,...), keys in this alist, are circularly connected by "modify-to", if all their modify contexts are used exclusively, i.e., no other modify chain uses any of them. The sequence in the modify chain is determined by modify scores depending on the values in the modify aspects, the charsym score defined in the definition tables and the score of the whole cset (see section 7.2 Defining X-Symbol Charsets).

(get 'longarrowright 'x-symbol-score) => -3500 (get 'longarrowright 'x-symbol-modify-aspects) => (1500 shift nil shape nil size big)

Otherwise, the "modify chain" is divided into modify subchains, which are those charsyms sharing the same modify context. All modify subchains using the same modify context, build a horizontal chain whose charsyms are circularly connected by "modify-to".

We build a key chain for all contexts (not just modify contexts), consisting of all charsyms (sorted according to modify scores) having the context. Input method Context modifies the context to the first charsym in the key chain.

x-symbol-key-suffix-string

If there is only one charsym in the key chain, C-= plus the context inserts the charsym. Otherwise, we determine a suffix for each charsym in the key chain by its index and this string. C-= plus the context plus the suffix inserts the charsym.

7.3.4 Defining Input Methods: Example

An example: Modify Modify Rotate Rotate Modify Other Score Aspect Score Aspect Context Contexts -------------------------------------------------------------- charsym 1w 150 nil 100 west `a' `c' charsym 2w 200 nil 100 west `b' - charsym 3w 350 big 100 west (`b') (-) charsym 1e 100 nil 200 east (`a') (`b') charsym 2e 250 big 200 east `a' `b' charsym 3e 300 big 200 east `a' - charsym 1n 100 nil 300 north `d' `c' charsym 2n 200 big 300 north `c' -

Assuming that all charsyms form one component, we have:

Rotate chains: (1w,2w)-1e-1n and 3w-(2e,3e)-2n. Modify chains: 1w-2w-3w and 1e-2w-3w and 1n-2n. Horizontal chains: 1e-1w-2e-3e (for modify context `a') 2w-3w (for modify context `b') Key chains: 1e-1w-2e-3e (for context `a') 1e-2w-2e-3w (for context `b') 1n-1w-2n (for context `c') 1n (for context `d')

That makes the following bindings:

Rotate-to: 1w->1e, 2w->1e, 1e->1n, 1n->1w 3w->2e, 2e->2n, 3e->2n, 2n->3w Modify-to: 1e->1w, 1w->2e, 2e->3e, 3e->1e (horizontal chain) 2w->3w, 3w->2w (horizontal chain) 1n->2n, 2n->1n (modify chain with exclusive modify contexts) CONTEXTS: `a'->1e, `b'->1e, `c'->1n, `d'->1n KEY: `a1'=1e, `a2'=1w, `a3'=2e, `a4'=3e, `b1'=1e, ..., `d'=1n

7.3.5 Customizing Input Methods

When defining contexts for characters, you should try to use default contexts to make them and key bindings as consistent as possible. E.g., package X-Symbol only defines explicit contexts for 186 of the 437 characters.

x-symbol-group-input-alist

Defines default scores and bindings for characters of a group (see section 3.6 Character Group and Token Classes). E.g., the definition (in x-symbol-latin1-table)

(aacute 225 (acute "a" Aacute))

defines aacute without any explicit contexts, but having the group acute and the subgroup `a'. The default input for the group is defined by the following element in this variable:

(acute 0 "%s'" t "'%s")

That means: 0 is added to the normal "modify-score" of the character. `%s'' and `'%s' with `%s' substituted by the subgroup, i.e., `a'' and `'a', are the contexts for aacute. The context `'a' is also used for input method Electric since it is prefixed by t.

x-symbol-key-min-length

It is quite unlikely that a one-character context is not the prefix of another context, at least when loading additional font definitions. In order not to have to change key bindings C-= key to C-= key 1, it is required that the length of the key binding without C-= is at least 2.

7.4 Extending Package X-Symbol

In this section, you are told what to consider and what to do when extending package X-Symbol with new characters and new token languages. If you only want to define a token language using existing characters, you only have to read the last section.

7.4.1 Extending X-Symbol with New Fonts    How to add fonts to X-Symbol.

7.4.2 Guidelines for Input Definitions    Guidelines for input definitions.

7.4.3 Emacs Lisp File Defining a New Font    How to define new character in a file.

7.4.4 Emacs Lisp File Extending a Token Language    Extending an existing language.

7.4.5 Emacs Lisp File Defining a New Token Language    Defining a new language.

7.4.1 Extending X-Symbol with New Fonts

If you add a new token language to package X-Symbol which should represent tokens by characters which are not yet defined by package X-Symbol, you have to add a new font to package X-Symbol, first.

When adding new fonts to package X-Symbol, consider that X-Symbol has to run under Emacs, XEmacs/Mule and XEmacs/no-Mule.

Running under Emacs and XEmacs/Mule requires that you cannot use all encodings in a font for characters: you should probably only use encodings 33 to 126 and 160 to 255. You should also use a unique pair of charset properties `CHARSET_REGISTRY' and `CHARSET_ENCODING'.

Running under XEmacs/no-Mule can leads to problems when major modes do not check whether the previous character is an escape character (in our case, a leading character, see section 7.1 Internal Representation of X-Symbol Characters) when looking at a character. Thus, you should probably not use encodings which represent characters in your default font with a special syntax.

In general, escape sequences use the digits of the current font. Thus, you should probably define the encodings 48 to 57 as digits `0' to `9'.
In LaTeX buffers, characters in `$%\{}' have a special syntax. Thus, you should probably not use encodings 36, 37, 92, 123 and 125 for characters which could also be useful with token languages tex and utex.
In HTML buffers, characters in `&<>' have a special syntax. Thus, you should probably not use encodings 38, 60 and 62 for characters which could also be useful with token language sgml.

You have to tell package X-Symbol which fonts to use for the normal text, subscripts and superscripts. See section 2.9 Lisp Coding when Using Other Fonts.

You have to tell X-Symbol, how to define Mule charsets with Emacs and XEmacs/Mule and which leading character to use with XEmacs/no-Mule. See section 7.2 Defining X-Symbol Charsets.

7.4.2 Guidelines for Input Definitions

Read section 7.3 Defining Input Methods. Look at the tables in `x-symbol.el'. Here are some guidelines of how to define the input methods for new characters:

Define reasonable character groups for new characters, see 3.6 Character Group and Token Classes. E.g., if you add the IPA font for phonetic characters, you are likely to define at least one additional charset group. If you do not know whether to use one or two groups for a set of characters, use two.
Define under which Grid/Menu header the character of the new character group should appear. You may also want to add additional headers for these characters. See section 3.6 Character Group and Token Classes.
If reasonable, define default contexts for characters of a group, see 7.3.5 Customizing Input Methods.
For the other characters, define contexts by Ascii sequences which look similar to the character.
Form a component for a set of characters which are strongly related to each other. In most cases, characters of a component are in the same group but not vice versa. E.g., the simple arrows already defined by package X-Symbol form one component. You form a component of characters by specifying parents in their definition, see 7.3.3 Defining Input Methods by Character Descriptions.
Use aspects to describe the new characters. Add new aspects to x-symbol-modify-aspects-alist and x-symbol-rotate-aspects-alist if necessary (see section 7.3.3 Defining Input Methods by Character Descriptions).
Finish the definition of your font file (see section 7.4.3 Emacs Lisp File Defining a New Font), load it with M-x load-file, and initialize the input methods, e.g., by invoking the grid (M-x x-symbol-grid).
If there are no errors, you are likely to get warnings about equal modify scores. In this case, the sequence of characters in the modify-to chain is random, so are the numerical suffixes of key bindings.
1. Define a base score for the whole X-Symbol charset ("cset score") which should be a positive number in order not to change the key bindings of previously defined X-Symbol characters.
2. Define reasonable scores for newly defined aspects and character groups.
3. Finally, fine-tune your definitions by charsym scores in the tables. This should be necessary only for a few characters.

7.4.3 Emacs Lisp File Defining a New Font

Now put all things together in a separate font definition file. You should not put it in a language definition file.

Here is a tiny example using only the lower half of the font:

(provide 'x-symbol-myfont) (defvar x-symbol-myfont-fonts '(("-xsymb-myfont-medium-r-normal--14-140-75-75-p-85-xsymb-myfont") ("-xsymb-myfont_sub-medium-r-normal--12-120-75-75-p-74-xsymb-myfont") ("-xsymb-myfont_sup-medium-r-normal--12-120-75-75-p-74-xsymb-myfont"))) (defvar x-symbol-myfont-cset '((("xsymb-myfont") ?\200 1000) (myfont-left "My font characters, left" 94 63) . nil))

(defvar x-symbol-myfont-table '((longarrownortheast 33 (arrow) (size big . arrownortheast)) (koerper 34 (setsymbol "K")) (circleS 35 (symbol "S") nil nil "SO"))) (x-symbol-init-cset x-symbol-myfont-cset x-symbol-myfont-fonts x-symbol-myfont-table)

Due to an XEmacs bug with char syntax inherit, you should also add the following line to files `x-symbol-xmas20.el' and `x-symbol-xmas21.el':

(modify-syntax-entry ?\200 "\\" (standard-syntax-table))

7.4.4 Emacs Lisp File Extending a Token Language

If you want to use the new font to extend an existing token language, define a new token language which inherits most variables from the "parent language". E.g., token language utex inherits most variables from tex, see `x-symbol-utex.el'.

A language must define variables for all language aspects, see 7.7 Language Internals. Our example defines a language mytex using the additional characters from 7.4.3 Emacs Lisp File Defining a New Font.

First, you have to register the language in a startup file:

(defvar x-symbol-mytex-name "My TeX macro") (defvar x-symbol-mytex-modes nil) (x-symbol-register-language 'mytex 'x-symbol-mytex x-symbol-mytex-modes)

The language definition file should look like (leaving out most parts which are similar to the ones in `x-symbol-utex.el'):

(provide 'x-symbol-mytex) (require 'x-symbol-tex) (defvar x-symbol-mytex-required-fonts '(x-symbol-myfont)) (put 'mytex 'x-symbol-font-lock-keywords 'x-symbol-tex-font-lock-keywords)

(defvar x-symbol-mytex-user-table nil) (defvar x-symbol-mytex-myfont-table '((longarrownortheast (math arrow user) "\\longnortheastarrow") (koerper (math letter user) "\\setK") (circleS (math ordinary amssymb) "\\circledS"))) (defvar x-symbol-mytex-table (append x-symbol-mytex-user-table '(nil) x-symbol-mytex-myfont-table x-symbol-tex-table))

It is important that you do not define a variable for the language access x-symbol-font-lock-keywords, but rather use the variable of the parent language directly, see 7.7 Language Internals.

During the testing phase, you should probably leave out the `'(nil)' which prevents warnings about redefinitions for the following elements.

7.4.5 Emacs Lisp File Defining a New Token Language

You might also want to define a new token language not based on another language.

As an example, consider a token language "My Unicode" (myuc) for buffers with major mode myuc-mode. Thus, we register the language by:

(defvar x-symbol-myuc-name "My Unicode") (defvar x-symbol-myuc-modes '(myuc-mode)) (x-symbol-register-language 'myuc 'x-symbol-myuc x-symbol-myuc-modes)

Each token if language myuc consists of `#' plus the hexadecimal representation of the Unicode with hexadecimal values where the case of digits is not important and the preferred case is upcase. A single `#' is represented by the token ##. In order to be more flexible, we want to define the tokens by their decimal value in the table. There are no subscript and no images. The code below (`x-symbol-myuc.el') is included in the source distribution of package X-Symbol.

(provide 'x-symbol-myuc) (defvar x-symbol-myuc-required-fonts nil) (defvar x-symbol-myuc-modeline-name "myuc") (defvar x-symbol-myuc-class-alist '((VALID "My Unicode" (x-symbol-info-face)) (INVALID "no My Unicode" (red x-symbol-info-face)))) (defvar x-symbol-myuc-font-lock-keywords nil) (defvar x-symbol-myuc-image-keywords nil) ...

(defvar x-symbol-myuc-case-insensitive 'upcase) (defvar x-symbol-myuc-token-shape '(?# "#[0-9A-Fa-f]+\\'" . "[0-9A-Fa-f]")) (defvar x-symbol-myuc-exec-specs '(nil (nil . "#[0-9A-Fa-f]+"))) (defvar x-symbol-myuc-input-token-ignore nil)

(defun x-symbol-myuc-default-token-list (tokens) (list (format "#%X" (car tokens)))) (defvar x-symbol-myuc-token-list 'x-symbol-myuc-default-token-list) (defvar x-symbol-myuc-user-table nil) (defvar x-symbol-myuc-xsymb0-table '((alpha () 945) (beta () 946))) (defvar x-symbol-myuc-table (append x-symbol-myuc-user-table x-symbol-myuc-xsymb0-table)) ...

7.5 Various Internals

7.5.1 Tagging Insert Commands for Token and Electric Don't break input methods Token and Electric.

7.5.2 Avoiding Hide/Show-Invisible Flickering Moving cursor in invisible commands.

7.5.1 Tagging Insert Commands for Token and Electric

Input methods Token (see section 4.2 Input Method Token: Replace Token by Character) and Electric (see section 4.8 Input Method Electric: Automatic Context) stop their auto replacement if you use a command which is not an insert command.

self-insert-command
newline
newline-and-indent
reindent-then-newline-and-indent
tex-insert-quote
TeX-insert-quote
TeX-insert-punctuation
TeX-insert-dollar
sgml-close-angle
sgml-slash: These commands and commands aliased to these are recognized as input commands by having a non-nil value of its symbol property x-symbol-input.

7.5.2 Avoiding Hide/Show-Invisible Flickering

Starting a command makes a previously revealed super- or subscript command (see section 5.1 Super- and Subscripts) invisible again. Repeatedly invoking commands which moves the point just by a small amount can lead to some flickering.

forward-char
forward-char-command
backward-char
backward-char-command: If the point position after the execution of these commands is still "at" the super- or subscript command, the command won't be made invisible at the first place. Each of these four commands have a function (1+ and 1-) as the value of its symbol property x-symbol-point-function which returns the position "after" when called with the position "before".

7.6 Design Alternatives

This section describes potential design alternatives and why they were not used.

7.6.1 Alternative Token Representations    Why we need the conversion.

7.6.2 Alternative Ways to Turn on X-Symbol Globally    How to turn on X-Symbol globally.

7.6.3 Alternative Auto Conversion Methods    When do we convert automatically.

7.6.1 Alternative Token Representations

Package X-Symbol represents tokens in the file by characters in the buffer. This requires an automatic conversion when visiting a file or saving a buffer, see 3.2 Conversion: Decoding and Encoding.

Another possibility would be to use the tokens directly in the buffer and just display them differently. You would need no conversion and you could copy the text easily to a message buffer. This could be done by a special face and an additional font-lock keyword for every token. The disadvantages make this approach unfeasible:

The editing commands would work on the tokens which are invisible for the user.
Extremely resource and startup-time consuming. If as many characters should be supported as done by package X-Symbol, including superscripts and subscripts, more than 2000 faces with display tables would have to be defined even without considering char aliases!
Time consuming. More than 2000 entries in you font-lock keywords would slow down the fontification considerably, which would be too much even when using lazy-shot!

Another possibility would be to adapt TeX to the representations of the corresponding characters in Emacs' buffer. Again, you would need no conversion. The disadvantages make this approach too restrictive:

You cannot adopt SGML to this approach.
You cannot read normal LaTeX files directly, you do not write normal LaTeX files.
You would have different TeX versions: one for X-Symbol with Emacs and XEmacs/Mule, one with XEmacs/no-Mule.
If you are not an extremely good TeX hacker, it would be impossible to adopt this approach to support more than 256 characters.

A third alternative would be very similar to the method used in this package. There would be just a slight difference when running under XEmacs/no-Mule: the internal representation of a character is always just one character, but we would also provide font properties for characters not of your default font. The disadvantages make this approach too unsafe:

Problems with current search/replace commands.
Problems with the current version of font-lock (it should never overwrite the font property for this character, even if the character matches some match in font-lock-keywords and overwrite is non-nil). This gets even more difficult with superscripts/subscripts.
Unless you can provide a syntax table for faces (you cannot), characters in different faces with the same encoding are in the same syntax class, which is irritating: e.g., \leftrightarrow and \approx would be delimiters.

7.6.2 Alternative Ways to Turn on X-Symbol Globally

This package hooks itself into hack-local-variables-hook which makes the installation very simple.

Another possibility would be to use the major-mode hooks which is the normal way how to turn on a minor mode. The disadvantages are:

The installation is more complicated.
Local variables in files are not yet processed (this was the main reason not to do it this way).

Another possibility would be to hook X-Symbol into find-file-hooks, as it is done in old versions of package X-Symbol. It would be as easy as the current approach but we would have to be careful with sequence of functions in find-file-hooks, especially with the function hooked in by font-lock.

7.6.3 Alternative Auto Conversion Methods

Without package crypt, this package automatically decodes tokens when turning on the minor mode (in hack-local-variables-hook, see section 7.6.2 Alternative Ways to Turn on X-Symbol Globally) or in after-insert-file-functions. This package automatically encodes characters in write-region-annotate-functions. The disadvantage is that the possibility to change buffers in write-region-annotate-functions is not official (see section 9.2.3 Wishlist: Changes in Emacs/XEmacs), i.e., not mentioned in the docstring (only mentioned for corresponding encode-functions of package format which use a similar loop in the C code).

With package crypt, this package automatically decodes tokens when turning on the minor mode. This package automatically encodes characters in write-file-hooks. The disadvantage is that the encoding is slower (use jka-compr instead crypt) and the problem with vc-next-action (see section 8.2 Spurious Encodings).

Without package crypt, Version 2.6 of this package automatically encoded characters in write-file-data-hooks. The advantage was that changing buffers there is official, the disadvantage is that it is also more complicated.

A totally different method would be to use package format. Unfortunately, this is not really possible, since a regexp in format-alist is much too weak, i.e., X-Symbol's decoding does not change any file headers which would represent the file format. In XEmacs, this package also fails to work properly with jka-compr and crypt.

7.7 Language Internals

In order to use a token language or accessing one of the language dependent values, the following conditions must be met:

The language must be registered. This makes it possible to select the language in the menus. It also prevents to load a potentially dangerous file when a file specifies a buffer-local value of x-symbol-language.

x-symbol-register-language
Registering a language includes stating the name of the feature (i.e., a file) which provides the language. The name of the language must have been already defined.
The file providing the language must have been loaded. This will be done automatically when the language is initialized. Customizing X-Symbol will also load the language files.
The language must be initialized. This will be done automatically if the language is used. This loads the language file and fails if the language has not been registered. If some minor language information is needed, e.g., in the highlight menu of the Grid (see section 4.5 Input Method Grid: Choose Highlighted Character), you should initialize the language explicitly, e.g., by the following command:

M-x x-symbol-init-language-interactive
Initialized a token language if it is not already initialized.

Language dependent values are accessed by language accesses:

x-symbol-language-value

Returns the language depending value. Also initializes language if necessary. E.g., we get the name of a language by the language access x-symbol-name. With a simplified expansion, we get

(x-symbol-language-value 'x-symbol-name 'tex) ==> (symbol-value (get 'tex 'x-symbol-name)) => (symbol-value 'x-symbol-tex-name) => "TeX macro"

x-symbol-language-access-alist

List of all language accesses. A token language must define all variables accessed by language accesses. A language access is a property of the language symbol, its value is the symbol naming a variable whose value is used.

If the language is a derived language, e.g., like language utex, the language access x-symbol-font-lock-keywords, should point directly to the variable of the parent language (here tex), see file `x-symbol-utex.el'.

7.8 Miscellaneous Internals

TODO. This is currently just a collection of unrelated stuff.

Characters might also define a subgroup which is a string defining some order on characters in the same group (see section 3.6 Character Group and Token Classes) and is also used for default contexts/bindings (see section 7.3.5 Customizing Input Methods).

x-symbol-group-syntax-alist: Lists all valid character groups. Under Emacs and XEmacs/Mule, this list also determines the syntax of characters.

The character group could probably also be used to define character categories if they are implemented in XEmacs.

This document was generated by Christoph Wedler on December, 8 2003 using texi2html

7.1 Internal Representation of X-Symbol Characters		How X-Symbol represents X-Symbol chars.
7.2 Defining X-Symbol Charsets		How X-Symbol defines additional chars.
7.3 Defining Input Methods		How X-Symbol defines the input methods.
7.4 Extending Package X-Symbol		How to add fonts and token languages.
7.5 Various Internals		How X-Symbol handles other aspects.
7.6 Design Alternatives		Why X-Symbol is not designed differently.
7.7 Language Internals		How X-Symbol handles languages.
7.8 Miscellaneous Internals		Various. TODO.

7.3.1 Defining Input Methods: Objectives		Input methods should be intuitive/consistent.
7.3.2 X-Symbol Character Descriptions: Example		An example introducing char descriptions.
7.3.3 Defining Input Methods by Character Descriptions		The aspects and the contexts of a character.
7.3.4 Defining Input Methods: Example		A complete example defining input methods.
7.3.5 Customizing Input Methods		How to customize the input methods.

7.4.1 Extending X-Symbol with New Fonts		How to add fonts to X-Symbol.
7.4.2 Guidelines for Input Definitions		Guidelines for input definitions.
7.4.3 Emacs Lisp File Defining a New Font		How to define new character in a file.
7.4.4 Emacs Lisp File Extending a Token Language		Extending an existing language.
7.4.5 Emacs Lisp File Defining a New Token Language		Defining a new language.

7.5.1 Tagging Insert Commands for Token and Electric		Don't break input methods Token and Electric.
7.5.2 Avoiding Hide/Show-Invisible Flickering		Moving cursor in invisible commands.

7.6.1 Alternative Token Representations		Why we need the conversion.
7.6.2 Alternative Ways to Turn on X-Symbol Globally		How to turn on X-Symbol globally.
7.6.3 Alternative Auto Conversion Methods		When do we convert automatically.