# 6. Supported Token Languages

The chapter describe the predefined token language. It also presents the language specific behavior for 3. Concepts of Package X-Symbol, 4. X-Symbol's Input Methods, and 5. Features of Package X-Symbol.

 6.1 Pseudo Token Language "x-symbol charsym" Token language "x-symbol charsym". 6.2 Token Language "TeX macro" (tex) Token language tex. 6.3 Token Language "SGML entity" (sgml) Token language sgml. 6.4 Token Language "BibTeX macro" (bib) Token language bib. 6.5 Token Language "TeXinfo command" (texi) Token language texi. 6.6 Languages Defined in Other Emacs Packages Languages defined in other Emacs Packages.

## 6.1 Pseudo Token Language "x-symbol charsym"

If no (or an invalid) token language is set for a buffer, the info in the echo area (see section 5.3 Info in Echo Area) for a X-Symbol Character in the buffer (if it exists) uses the name of its charsym. In this manual, we actually refer to X-Symbol characters by their charsym name, e.g., alpha.

A charsym is a symbol which is used internally to represent a X-Symbol character. Charsyms are used instead characters in all user variables of package X-Symbol.

The highlight menu of the Grid (see section 4.5 Input Method Grid: Choose Highlighted Character) also offers to insert a charsym name. Charsyms can also be used for input method Read Token, see 4.3 Input Method Read Token: Minibuffer Completion.

You cannot use this pseudo language to turn on the X-Symbol minor mode (see section 3.3 Minor Mode), you cannot decode charsyms to their characters, and you cannot encode characters to charsyms.

## 6.2 Token Language "TeX macro" (tex)

For buffers using the major mode latex-mode, tex-mode or plain-tex-mode, we use token language TeX macro (tex). This language provides the display of super-/subscripts and images. If the buffer visits a file with extension .tex', X-Symbol mode is automatically turned on.

 6.2.1 Basics of Language "TeX macro" Basics of language "TeX macro". 6.2.2 Super-/Subscripts and Images in LaTeX Super-/subscripts and images in LaTeX. 6.2.3 Problems with TeX Macros Problems with TeX macros. 6.2.4 The Conversion of TeX Macros How the conversion of TeX macros works. 6.2.5 Extra Symbols of Language "TeX Macro"

### 6.2.1 Basics of Language "TeX macro"

The standard behavior can be controlled by the following variables:

x-symbol-tex-modes
x-symbol-tex-auto-style
The variables known from 3.3 Minor Mode. If the buffer visits a file with extension .tex', super-/subscripts and images are displayed, otherwise unique decoding (see section 3.2.4 Unique Decoding) will be used.

x-symbol-tex-auto-coding-alist
Used there to automatically deduce the specific encoding of the file (see section 3.2.2 File Coding of 8bit Characters) if the file visited by the buffer has the extension .tex'. It searches for one of the following two strings in the current buffer, including the comment:

 \usepackage[encoding]{inputenc} %& -translation-file=ienc 

where encoding should be one of latin1', latin2', latin3', latin5', or latin9', and enc should be one of l1' or l2'. 8bit characters are not encoded if the file if the search was successful (see section 3.2.3 Store or Encode 8bit Characters).

x-symbol-tex-coding-master
If one of the above strings cannot be found in the current buffer, and the current buffer has a buffer-local string value of TeX-master, also search in the file denoted by that value for the strings. (Buffer-local variables will not be inherited.)

The input methods and the character info in the echo area are controlled by:

x-symbol-tex-header-groups-alist

x-symbol-tex-extra-menu-items
There is an extra menu item to remove the braces around text-mode letters and other text-mode symbols.

x-symbol-tex-electric-ignore
x-symbol-tex-electric-ignore-regexp
Input method Electric (see section 4.8 Input Method Electric: Automatic Context) is disabled if the character is not of the correct TeX mode, i.e., it only produces a math-mode character in a math area and a text-mode character in a text area (this test requires package texmathp, see 2.6.1 LaTeX Packages). Postfix tilde is not electric, because ~' produces a space in TeX.

x-symbol-tex-token-suppress-space
Input method Token (see section 4.2 Input Method Token: Replace Token by Character) only converts a token ending with a control word like \i, if the character following the token is no letter. If that token is a text-mode token and a SPC has been entered without a prefix argument, the SPC will only perform the replacement, it will not insert a space, i.e., it will act like C-u 0 SPC.

x-symbol-tex-class-alist
x-symbol-tex-class-face-alist
Various token classes (see section 3.6 Character Group and Token Classes) are defined. They are used to give some info (see section 5.3 Info in Echo Area) about the characters spacing behavior, which LaTeX packages are necessary to use the character (see section 6.2.5 Extra Symbols of Language "TeX Macro"), and about the conversion (see section 6.2.4 The Conversion of TeX Macros). X-Symbol uses blue for text-mode only and purple for math-mode only characters in the Grid (see section 4.5 Input Method Grid: Choose Highlighted Character and the character info.

### 6.2.2 Super-/Subscripts and Images in LaTeX

The display of super- and subscripts (see section 5.1 Super- and Subscripts) is controlled by:

x-symbol-tex-font-lock-limit-regexp
The superscript command ^ and the subscript command _ is recognized. The argument can be provided with and without braces. The argument should not span more than one line and should not contain a super-/subscript command.

x-symbol-tex-font-lock-allowed-faces
The characters ^' and _' are not always commands (see section 6.2.3 Problems with TeX Macros), e.g., in the argument of \ref. X-Symbol uses the usual syntax highlighting keywords to decide whether to recognize these characters as super-/subscript commands: they are commands if they are not highlighted or highlighted with the usual math-mode faces.

This might lead to problems: 8.4.4 I Cannot See any/some Super- or Subscripts, 8.4.5 I See Super- and Subscripts where I Don't Want Them.. Using texmathp (see section 2.6.1 LaTeX Packages) has even more problems:

• The syntax highlighting (which is used for super-/subscripts) would be much too slow.

• With own LaTeX environments, you would need to customize texmathp.

• It is actually wrong: whether ^' and _' are super-/subscripts commands does not depend on whether we are in TeX's math mode, it depends on its catcodes (which are changed by commands like \ref).

The display of images (see section 5.2 Images at the end of Image Insertion Commands) is controlled by:

x-symbol-tex-image-keywords
The following commands are recognized. Extension ext stands for eps' (which is the default extension for both versions of \includegraphics if the extension is omitted there), ps', gif', png', jpeg', jpg', or pdf'. Options options can be omitted with their surrounding brackets or preceding comma, respectively.

 \input{file.pstex_t} \includegraphics[options][options]{file.ext} \includegraphics*[options][options]{file.ext} \epsfig{file=file.ext,options} \psfig{file=file.ext,options} \epsfbox[options]{file.ext} \epsffile[options]{file.ext} 

x-symbol-tex-master-directory
Relative file names (see section 5.2.1 Display of Images, explicitly or implicitly) are relative to the directory part of variable TeX-master if it is buffer-local and a string. Otherwise, they are relative to the directory of the current file.

x-symbol-tex-image-searchpath
Files with implicitly relative names are meant to be searched in a search path. It defaults to the list of directories specified by the environment variable TEXPICTS or TEXINPUTS (see section TeX environment variables' in Kpathsea Manual), extended by ./' if necessary.

Each directory in this list is used to expand the file name. The first expansion naming a readable file is used. Relative directories in this list are expanded in the master directory mentioned above.

This mimics the standard behavior of TeX, omitting the "built-in" directories of the search path (see section Path sources' in Kpathsea Manual).

x-symbol-tex-image-cached-dirs
The file name in the image command should not have a directory part or the directory part should be figures/' if the image should be cached in the memory cache.

### 6.2.3 Problems with TeX Macros

Like with other token languages, the conversion between characters and TeX macros induce the problem that we have two conflicting requirements: we would like X-Symbol not to change the file when visiting and saving a file, and we would like X-Symbol to use characters for all corresponding macros. See section 3.2.4 Unique Decoding.

The additional problem with TeX macros is that there is no fixed and simple definition of TeX macros, and many users have their personal TeX style, while many users are probably not aware that the style also influences TeX's typesetting:

• The tokens in TeX are not ended by a dedicated character (like SGML entities are ended by ;'). Instead, we need the next char to decide whether a macro ends, which would be no problem if TeX would have a character which has no meaning except separating tokens (like space in most programming languages). Unfortunately, this is not the case: after an control word (an all-letter macro), a space has no meaning, but it does produce a space in the output after characters and other macros, except in math mode.

During decoding, a text-mode control word has to be replaced either with its trailing spaces or not be replaced at all. Since the number of spaces can vary and X-Symbol does not remember the original TeX sequence of a character, X-Symbol would change the file if it would use characters for all sequences.

• During encoding, a space after a character in the buffer must produce a space in the document output, since users normally do not care whether the character is represented by a control word or not. Let us assume that we (Bavarians) want to produce the output Maß Bier'. In the info file, you will probably not see any 8bit characters (the sharp s' is shown as ß').

• Many people would use Ma\ss\ Bier'. This is (almost ever) fine in text mode, but a \ ' in math mode is not ignored (whereas the spaces after characters are). If we have text- and math-mode control word, we have a problem, since math-mode detection cannot work properly without TeX processing.

• Many people would use Ma\ss{} Bier'. This has less problems and is therefore used by X-Symbol. The {}' at the end of the control word is not used if the character is not followed by a space, e.g., to produce Straße', we use Stra\ss e'. Consequently, Ma\ss\ Bier' in the file would be decoded to Maß\ Bier', which would be encoded to the original sequence in the file.

• Some people would always use {}' after a text-mode control word, even it is not followed by a space, like Stra\ss{}e'. This is wrong, since it breaks ligatures and kerns. For example, compare the output of \L V' with \L{}V' using T1' font encoding.

• Up to Version 4.1, X-Symbol surrounded a text-mode control word with braces, like Stra{\ss}e'. This was probably even worse than always adding {}' at the end of the control word. It was used, because it is required by BibTeX (see section 6.4 Token Language "BibTeX macro" (bib)). Unfortunately, BibTeX sends this bad sequence directly to LaTeX, but this has nothing to do with X-Symbol.

• The accented characters are not represented by one tokens in TeX. Most people use \"a' to produce an ä', while some use \"{a}'. X-Symbol uses the former, it does not even decode the latter automatically. Up to Version 4.1, X-Symbol used {\"a}', having the same problems as using Stra{\ss}e'.

• Around a dozen characters can be produced by more than one TeX macro, like \neq and \ne. Here, X-Symbol decodes both forms, because it is probably a bad idea to redefine standard TeX macros. This will not be done with in style files (see section 3.2.4 Unique Decoding).

• In TeX, you can change the lexer on the fly, i.e., in a strict sense, any conversion is unsafe without TeX processing. Since the most likely change is to change the catcode of the character @' to a letter (used in LaTeX's style files), this character is considered a letter by X-Symbol. This means that although both \ss @' and \ss@' usually produce the same output, only the first is decoded to ß@'.

• In TeX, the definitions of macros can also change on the fly i.e., in a strict sense, any conversion is unsafe without TeX processing. X-Symbol assumes that you do not do something like that except as done by the standard LaTeX \verb command, and the verbatim and tabbing environments.

### 6.2.4 The Conversion of TeX Macros

The TeX macros for Latin characters are according to the LaTeX package inputenc.sty', v0.97+. Package X-Symbol uses U00B5 for \mathmicro, not for \mu, though! See section 9.2.4 Wishlist: Changes in LaTeX.

It is assumed that you do not redefine standard TeX macros like \ne (see section 6.2.4 The Conversion of TeX Macros), if you do so, you should better use unique decoding (see section 3.2.4 Unique Decoding).

The encoding of characters to TeX macros works as follows:

• If the character is preceded by an odd number of backslashes, insert a space before the character.

• Accented characters are encoded without braces, e.g., we encode ç' to \c c'. Accents are encoded with braces, e.g., we use \c{ }' and \u{}'.

Additionally, the encoding of characters to TeX macros which are control words (all-letter macros), or whose TeX representation ends with a control word (like \'\i') works as follows:

• If the character is followed by a letter, replace the character by the macro and insert a space.

• If the macro is a text-mode macro and followed by one or more blanks, replace the character and insert {}'.

• Otherwise, just replace the character.

The decoding of TeX macros which are control words to characters works as follows:

• If the macro is a text-mode macro and followed by {}' which is followed by a blank, replace the macro and delete the braces.

• If the macro is a text-mode macro and followed by one are more blanks, we have the following rule:

• If we have exactly one blank, the blank is a space, and it is not followed by a %' (comment character), replace the macro by the corresponding character and delete the space. (With unique decoding, the character following the space must be a letter.)

• Otherwise, do not decode the macro!

• Otherwise, just replace the macro.

To clarify, letter means A'-Z', a'-z', or @', blank means a space, newline or the end of the buffer (therefore, the last character in the buffer is always followed by a blank).

There are three control words which are both text-mode and math mode macros: \ldots, \vdots, and (by accident) \angle. They are all treated like math-mode characters, but their minibuffer info (see section 5.3 Info in Echo Area) includes gobbles space' (spaces in the buffer after the character have no impact on the document),

Additionally, the following commands and environments are processed during decoding (but we are just looking for strings, i.e., they are also processed in comments):

x-symbol-tex-verb-delimiter-regexp
If the command \verb is found, its argument is not decoded if it is delimited by one of the following characters: -', !', #', \$', &', *', +', /', =', ?', ^', |', or !'.

x-symbol-tex-env-verbatim-regexp
The contents of the verbatim environment is not decoded. To produce accented characters inside this environment, use the LaTeX package inputenc.sty'.

x-symbol-tex-env-tabbing-regexp
Inside a tabbing environment, the macro sequences starting with \', \'', \=' and \-' are not decoded. It is probably better (with or without X-Symbol) to use the LaTeX package inputenc.sty' or to the Tabbing environment, to be found in the CTAN archives.

During encoding, these commands and environments are not respected, since it does not make any sense to have X-Symbol's private characters in the TeX file.

Final note: in the info file, you will probably not see any 8bit characters.

You might want change the conversion between characters and tokens in language tex by changing:

x-symbol-tex-user-table
You can define you own tokens for X-Symbol characters. E.g., if you like to have the command \sqrt represented by a character (shadowing the entry for \surd), add the following to your ~/.emacs':

 (setq x-symbol-tex-user-table '((radical (math special) "\\sqrt"))) 

### 6.2.5 Extra Symbols of Language "TeX Macro"

This section describes what you should put into your private style file or your document if you want to use extra symbols, i.e., characters whose info in the echo area (see section 5.3 Info in Echo Area) contains s.th. like package.sty' or user'. If you do not use the corresponding characters, you do not have to do anything, of course.

The TeX macros \Box, \Diamond, \leadsto, \Join, \lhd, \mho, \rhd, \sqsupset, \sqsubset, \unlhd, \unrhd, are defined in LaTeX package latexsym.sty':

 \usepackage{latexsym} 

Note that these macros are also defined amssymb.sty'. Since the first four macros are defined differently (better) in latexsym.sty', it does make sense to load both LaTeX packages (e.g., amssymb.sty' simply defines \Diamond to be the same as \lozenge).

The TeX macros \boldsymbol, \circledast, \circledcirc, \circleddash, \digamma, \gtrapprox, \gtrsim, \lessapprox, \lesssim, \triangleq, \varkappa are defined in AMS LaTeX package amssymb.sty':

 \usepackage{amssymb} 

The TeX macros \bigsqcap, \llbracket, \rrbracket, \llparenthesis, \rrparenthesis are defined in the LaTeX package stmaryrd.sty':

 \usepackage{stmaryrd} 

The TeX macros \guilsinglleft, \guilsinglright, \dj, \NG, \ng, \DH, \DJ, \dh, \dj, \TH, \th, \guillemotleft, \guillemotright and the ogonek characters are only defined if you use T1 font encoding:

 \usepackage[T1]{fontenc} 

The TeX macro \mathmicro for U00B5 can be defined by (see section 9.2.4 Wishlist: Changes in LaTeX):

 \let\mathmicro\mu 

You should define the following in your LaTeX file (if you use the corresponding characters), the first can only be used with T1 font encoding.


The TeX macros \textordfeminine, \textordmasculine, \textdegree, \textonequarter, \textonehalf, \textthreequarters, \mathonesuperior, \mathtwosuperior, \maththreesuperior, \textcopyright are only defined when using LaTeX package inputenc.sty':

 \usepackage[latin1]{inputenc} 

The TeX macros \textcent, \textcurrency, \textyen, \textbrokenbar, \textmalteseH, \textmalteseh are defined as not available in LaTeX package inputenc.sty'. See section 9.2.4 Wishlist: Changes in LaTeX. If you use this package and want to define these commands, use \renewcommand (or \def) after, e.g.:


## 6.3 Token Language "SGML entity" (sgml)

For buffers using the major mode html-mode, hm--html-mode, html-helper-mode, sgml-mode or xml-mode, we use token language SGML entity (sgml). This language provides the display of super-/subscripts and images. If the buffer visits a file and uses a HTML mode, X-Symbol mode is automatically turned on.

 6.3.1 Basics of Language "SGML entity" 6.3.2 Super-/Subscripts and Images in HTML 6.3.3 The Conversion of SGML Entities How the conversion of SGML entities works.

### 6.3.1 Basics of Language "SGML entity"

The standard behavior can be controlled by the following variables:

x-symbol-sgml-modes
x-symbol-sgml-auto-style
The variables known from 3.3 Minor Mode. If the buffer uses a HTML mode, super-/subscripts and images are displayed, otherwise unique decoding (see section 3.2.4 Unique Decoding) will be used.

x-symbol-sgml-auto-coding-alist
Used there to automatically deduce the specific encoding of the file (see section 3.2.2 File Coding of 8bit Characters). It searches for the following string in the current buffer, including the comment:

  

where encoding should be one of iso-8859-1', iso-8859-2', iso-8859-3', iso-8859-9', or iso-8859-15'. 8bit characters are not encoded if the file if the search was successful (see section 3.2.3 Store or Encode 8bit Characters).

The input methods and the character info in the echo area are controlled by:

x-symbol-sgml-header-groups-alist
Defines the headers and their characters for the language specific Grid and Menu.

x-symbol-sgml-extra-menu-items
There are no special entries in the X-Symbol menu.

x-symbol-sgml-electric-ignore
There is no additional constraint to the ones mentioned in 4.8 Input Method Electric: Automatic Context.

x-symbol-sgml-class-alist
x-symbol-sgml-class-face-alist
Token classes (see section 3.6 Character Group and Token Classes) are only used to define a coloring scheme. X-Symbol uses dark orange or dark red for non-Latin-1 characters in the Grid (see section 4.5 Input Method Grid: Choose Highlighted Character and the character info (see section 5.3 Info in Echo Area), dark red for characters without defined entity names in HTML (see section 6.3.3 The Conversion of SGML Entities).

### 6.3.2 Super-/Subscripts and Images in HTML

The display of super- and subscripts (see section 5.1 Super- and Subscripts) is controlled by:

x-symbol-sgml-font-lock-regexp
x-symbol-sgml-font-lock-limit-regexp
x-symbol-sgml-font-lock-alist
x-symbol-sgml-font-lock-contents-regexp
The superscript command <sup>...</sup> and the subscript command <sub>...</sub> is recognized. The contents should contain at least one character which is not a space or a nobreakspace.

The display of images (see section 5.2 Images at the end of Image Insertion Commands) is controlled by:

x-symbol-sgml-image-keywords
The following commands are recognized. Extension ext stands for gif', png', jpeg' or jpg'.

  

x-symbol-sgml-master-directory
x-symbol-sgml-image-searchpath
Relative file names (see section 5.2.1 Display of Images) are relative to the directory of the current file.

x-symbol-sgml-image-file-truename-alist
The file name prefix file:' is ignored. For any other file name which starts with letters and then a colon, e.g., with http:' or C:\' (which is no URL anyway), the image insertion command will be skipped. By changing this variable, you could specify that the prefix http://www.fmi.uni-passau.de/~wedler/' corresponds to ~/public_html/'.

x-symbol-sgml-image-cached-dirs
The file name in the image command should not have a directory part or the directory part should be images/' or pictures/' if the image should be cached in the memory cache.

### 6.3.3 The Conversion of SGML Entities

Most character entities of HTML-4.0 are supported, except the following: uppercase Greek which look like uppercase Latin, "markup-significant and internationalization" characters, and some quotes. See http://www.w3.org/TR/REC-html40/sgml/entities.html.

By default, we encode to entity references like &amp;, and decode from both entity references and character references like &#38;. For Latin-N characters without defined entity names in HTML (e.g. scedilla), we can only use character references.

Do not expect Netscape before v6 to display non-Latin-1 characters correctly (this might work by specifying the charset UTF-8 and using character references).

You might want change the conversion between characters and tokens in language sgml by changing:

x-symbol-sgml-token-list
A symbol, which defines whether to use entity references, character references, or entity references for Latin-1 characters and character references for others.

x-symbol-sgml-user-table
It is probably not a good idea to change the defined tokens (except via the variable above), but you might want to add some definitions:

 (setq x-symbol-sgml-user-table '((circ () 999 "&bcomp;"))) 

## 6.4 Token Language "BibTeX macro" (bib)

For buffers using the major mode bibtex-mode, we use token language BibTeX macro (bib). This language does not provide the display of super-/subscripts and images. If the buffer visits a file, X-Symbol mode is automatically turned on. It is controlled by:

x-symbol-bib-modes
x-symbol-bib-auto-style
The variables known from 3.3 Minor Mode. There is no automatic deduction of the file encoding, 8bit characters are usually encoded, and there is usually no unique decoding. See section 3.2 Conversion: Decoding and Encoding.

The major difference between this language and the token language tex is that the tokens for text-mode characters are most likely enclosed by braces. This has some problems (see section 6.2.3 Problems with TeX Macros), but is required by the program bibtex.

The input methods and most features except super-/subscripts and images work like in token language tex (see section 6.2 Token Language "TeX macro" (tex)):

x-symbol-bib-header-groups-alist
x-symbol-bib-electric-ignore
x-symbol-bib-class-alist
x-symbol-bib-class-face-alist
Like in 6.2.2 Super-/Subscripts and Images in LaTeX.

x-symbol-bib-extra-menu-items
There are no special entries in the X-Symbol menu.

You might want change the conversion between characters and tokens in language bib by changing:

x-symbol-bib-user-table
x-symbol-tex-user-table
Use the former for bib-only changes, the latter also influences the conversion with token language tex.

## 6.5 Token Language "TeXinfo command" (texi)

For buffers using the major mode texinfo-mode, we use token language TeXinfo command (texi). This language does not provide the display of super-/subscripts and images. If the buffer visits a file, X-Symbol mode is automatically turned on. It is controlled by:

x-symbol-texi-modes
x-symbol-texi-auto-style
The variables known from 3.3 Minor Mode. There is no automatic deduction of the file encoding, 8bit characters are usually encoded, and there is usually no unique decoding. See section 3.2 Conversion: Decoding and Encoding.

With x-symbol-8bits having value nil (the default), it might still happen that the saved file contains 8bit characters, since token language texi does not define tokens for all characters in the Latin charsets supported by X-Symbol. See section 3.2.3 Store or Encode 8bit Characters.

With x-symbol-unique having value nil (the default), we have unique decoding anyway, since token language texi does only define one token per character, i.e., the value is not important if x-symbol-8bits is nil. See section 3.2.4 Unique Decoding.

The input methods and the character info in the echo area are controlled by:

x-symbol-texi-header-groups-alist
Defines the headers and their characters for the language specific Grid and Menu.

x-symbol-texi-extra-menu-items
There are no special entries in the X-Symbol menu.

x-symbol-texi-electric-ignore
There is no additional constraint to the ones mentioned in 4.8 Input Method Electric: Automatic Context.

x-symbol-texi-class-alist
x-symbol-texi-class-face-alist
Only a few token classes (see section 3.6 Character Group and Token Classes) are defined, the most interesting induces the character info (see section 5.3 Info in Echo Area) to display not as code' for @minus{} (@minus{} should not used inside @code and @example). No coloring scheme is defined.

At least with makeinfo-4.0, you do not get accented characters in the info file for the corresponding TeXinfo commands in the .texi' file, the HTML output might contain illegal "SGML entities" like &140;.

At least with texi2html-1.62, you see accented characters in the HTML output for the corresponding TeXinfo commands in the .texi' file, but the output might also contain illegal "SGML entities" like &140;.

You might want change the conversion between characters and tokens in language texi by changing:

x-symbol-texi-user-table`
Extra entries for the conversion.

## 6.6 Languages Defined in Other Emacs Packages

It is no problem for other Emacs packages to define their own token language (see section 7.4 Extending Package X-Symbol).

I know of the following package--please check its manual for details.

• Package ProofGeneral defines token language "Isabelle symbol".

This document was generated by Christoph Wedler on December, 8 2003 using texi2html