X-Symbol Overview Related Details Manual News »Project »Download

6. Supported Token Languages

The chapter describe the predefined token language. It also presents the language specific behavior for 3. Concepts of Package X-Symbol, 4. X-Symbol's Input Methods, and 5. Features of Package X-Symbol.

6.1 Pseudo Token Language "x-symbol charsym"  Token language "x-symbol charsym".
6.2 Token Language "TeX macro" (tex)  Token language tex.
6.3 Token Language "SGML entity" (sgml)  Token language sgml.
6.4 Token Language "BibTeX macro" (bib)  Token language bib.
6.5 Token Language "TeXinfo command" (texi)  Token language texi.
6.6 Languages Defined in Other Emacs Packages  Languages defined in other Emacs Packages.


6.1 Pseudo Token Language "x-symbol charsym"

If no (or an invalid) token language is set for a buffer, the info in the echo area (see section 5.3 Info in Echo Area) for a X-Symbol Character in the buffer (if it exists) uses the name of its charsym. In this manual, we actually refer to X-Symbol characters by their charsym name, e.g., alpha.

A charsym is a symbol which is used internally to represent a X-Symbol character. Charsyms are used instead characters in all user variables of package X-Symbol.

The highlight menu of the Grid (see section 4.5 Input Method Grid: Choose Highlighted Character) also offers to insert a charsym name. Charsyms can also be used for input method Read Token, see 4.3 Input Method Read Token: Minibuffer Completion.

You cannot use this pseudo language to turn on the X-Symbol minor mode (see section 3.3 Minor Mode), you cannot decode charsyms to their characters, and you cannot encode characters to charsyms.


6.2 Token Language "TeX macro" (tex)

For buffers using the major mode latex-mode, tex-mode or plain-tex-mode, we use token language TeX macro (tex). This language provides the display of super-/subscripts and images. If the buffer visits a file with extension `.tex', X-Symbol mode is automatically turned on.

6.2.1 Basics of Language "TeX macro"  Basics of language "TeX macro".
6.2.2 Super-/Subscripts and Images in LaTeX  Super-/subscripts and images in LaTeX.
6.2.3 Problems with TeX Macros  Problems with TeX macros.
6.2.4 The Conversion of TeX Macros  How the conversion of TeX macros works.
6.2.5 Extra Symbols of Language "TeX Macro"  


6.2.1 Basics of Language "TeX macro"

The standard behavior can be controlled by the following variables:

x-symbol-tex-modes
x-symbol-tex-auto-style
The variables known from 3.3 Minor Mode. If the buffer visits a file with extension `.tex', super-/subscripts and images are displayed, otherwise unique decoding (see section 3.2.4 Unique Decoding) will be used.

x-symbol-tex-auto-coding-alist
Used there to automatically deduce the specific encoding of the file (see section 3.2.2 File Coding of 8bit Characters) if the file visited by the buffer has the extension `.tex'. It searches for one of the following two strings in the current buffer, including the comment:

 
\usepackage[encoding]{inputenc}
%& -translation-file=ienc

where encoding should be one of `latin1', `latin2', `latin3', `latin5', or `latin9', and enc should be one of `l1' or `l2'. 8bit characters are not encoded if the file if the search was successful (see section 3.2.3 Store or Encode 8bit Characters).

x-symbol-tex-coding-master
If one of the above strings cannot be found in the current buffer, and the current buffer has a buffer-local string value of TeX-master, also search in the file denoted by that value for the strings. (Buffer-local variables will not be inherited.)

The input methods and the character info in the echo area are controlled by:

x-symbol-tex-header-groups-alist
We use the standard Grid and Menu headers.

x-symbol-tex-extra-menu-items
There is an extra menu item to remove the braces around text-mode letters and other text-mode symbols.

x-symbol-tex-electric-ignore
x-symbol-tex-electric-ignore-regexp
Input method Electric (see section 4.8 Input Method Electric: Automatic Context) is disabled if the character is not of the correct TeX mode, i.e., it only produces a math-mode character in a math area and a text-mode character in a text area (this test requires package texmathp, see 2.6.1 LaTeX Packages). Postfix tilde is not electric, because `~' produces a space in TeX.

x-symbol-tex-token-suppress-space
Input method Token (see section 4.2 Input Method Token: Replace Token by Character) only converts a token ending with a control word like \i, if the character following the token is no letter. If that token is a text-mode token and a SPC has been entered without a prefix argument, the SPC will only perform the replacement, it will not insert a space, i.e., it will act like C-u 0 SPC.

x-symbol-tex-class-alist
x-symbol-tex-class-face-alist
Various token classes (see section 3.6 Character Group and Token Classes) are defined. They are used to give some info (see section 5.3 Info in Echo Area) about the characters spacing behavior, which LaTeX packages are necessary to use the character (see section 6.2.5 Extra Symbols of Language "TeX Macro"), and about the conversion (see section 6.2.4 The Conversion of TeX Macros). X-Symbol uses blue for text-mode only and purple for math-mode only characters in the Grid (see section 4.5 Input Method Grid: Choose Highlighted Character and the character info.


6.2.2 Super-/Subscripts and Images in LaTeX

The display of super- and subscripts (see section 5.1 Super- and Subscripts) is controlled by:

x-symbol-tex-font-lock-limit-regexp
The superscript command ^ and the subscript command _ is recognized. The argument can be provided with and without braces. The argument should not span more than one line and should not contain a super-/subscript command.

x-symbol-tex-font-lock-allowed-faces
The characters `^' and `_' are not always commands (see section 6.2.3 Problems with TeX Macros), e.g., in the argument of \ref. X-Symbol uses the usual syntax highlighting keywords to decide whether to recognize these characters as super-/subscript commands: they are commands if they are not highlighted or highlighted with the usual math-mode faces.

This might lead to problems: 8.4.4 I Cannot See any/some Super- or Subscripts, 8.4.5 I See Super- and Subscripts where I Don't Want Them.. Using texmathp (see section 2.6.1 LaTeX Packages) has even more problems:

  • The syntax highlighting (which is used for super-/subscripts) would be much too slow.

  • With own LaTeX environments, you would need to customize texmathp.

  • It is actually wrong: whether `^' and `_' are super-/subscripts commands does not depend on whether we are in TeX's math mode, it depends on its catcodes (which are changed by commands like \ref).

The display of images (see section 5.2 Images at the end of Image Insertion Commands) is controlled by:

x-symbol-tex-image-keywords
The following commands are recognized. Extension ext stands for `eps' (which is the default extension for both versions of \includegraphics if the extension is omitted there), `ps', `gif', `png', `jpeg', `jpg', or `pdf'. Options options can be omitted with their surrounding brackets or preceding comma, respectively.

 
\input{file.pstex_t}
\includegraphics[options][options]{file.ext}
\includegraphics*[options][options]{file.ext}
\epsfig{file=file.ext,options}
\psfig{file=file.ext,options}
\epsfbox[options]{file.ext}
\epsffile[options]{file.ext}

x-symbol-tex-master-directory
Relative file names (see section 5.2.1 Display of Images, explicitly or implicitly) are relative to the directory part of variable TeX-master if it is buffer-local and a string. Otherwise, they are relative to the directory of the current file.

x-symbol-tex-image-searchpath
Files with implicitly relative names are meant to be searched in a search path. It defaults to the list of directories specified by the environment variable TEXPICTS or TEXINPUTS (see section `TeX environment variables' in Kpathsea Manual), extended by `./' if necessary.

Each directory in this list is used to expand the file name. The first expansion naming a readable file is used. Relative directories in this list are expanded in the master directory mentioned above.

This mimics the standard behavior of TeX, omitting the "built-in" directories of the search path (see section `Path sources' in Kpathsea Manual).

x-symbol-tex-image-cached-dirs
The file name in the image command should not have a directory part or the directory part should be `figures/' if the image should be cached in the memory cache.


6.2.3 Problems with TeX Macros

Like with other token languages, the conversion between characters and TeX macros induce the problem that we have two conflicting requirements: we would like X-Symbol not to change the file when visiting and saving a file, and we would like X-Symbol to use characters for all corresponding macros. See section 3.2.4 Unique Decoding.

The additional problem with TeX macros is that there is no fixed and simple definition of TeX macros, and many users have their personal TeX style, while many users are probably not aware that the style also influences TeX's typesetting:


6.2.4 The Conversion of TeX Macros

The TeX macros for Latin characters are according to the LaTeX package `inputenc.sty', v0.97+. Package X-Symbol uses U00B5 for \mathmicro, not for \mu, though! See section 9.2.4 Wishlist: Changes in LaTeX.

It is assumed that you do not redefine standard TeX macros like \ne (see section 6.2.4 The Conversion of TeX Macros), if you do so, you should better use unique decoding (see section 3.2.4 Unique Decoding).

The encoding of characters to TeX macros works as follows:

Additionally, the encoding of characters to TeX macros which are control words (all-letter macros), or whose TeX representation ends with a control word (like `\'\i') works as follows:

The decoding of TeX macros which are control words to characters works as follows:

To clarify, letter means `A'-`Z', `a'-`z', or `@', blank means a space, newline or the end of the buffer (therefore, the last character in the buffer is always followed by a blank).

There are three control words which are both text-mode and math mode macros: \ldots, \vdots, and (by accident) \angle. They are all treated like math-mode characters, but their minibuffer info (see section 5.3 Info in Echo Area) includes `gobbles space' (spaces in the buffer after the character have no impact on the document),

Additionally, the following commands and environments are processed during decoding (but we are just looking for strings, i.e., they are also processed in comments):

x-symbol-tex-verb-delimiter-regexp
If the command \verb is found, its argument is not decoded if it is delimited by one of the following characters: `-', `!', `#', `$', `&', `*', `+', `/', `=', `?', `^', `|', or `!'.

x-symbol-tex-env-verbatim-regexp
The contents of the verbatim environment is not decoded. To produce accented characters inside this environment, use the LaTeX package `inputenc.sty'.

x-symbol-tex-env-tabbing-regexp
Inside a tabbing environment, the macro sequences starting with `\`', `\'', `\=' and `\-' are not decoded. It is probably better (with or without X-Symbol) to use the LaTeX package `inputenc.sty' or to the Tabbing environment, to be found in the CTAN archives.

During encoding, these commands and environments are not respected, since it does not make any sense to have X-Symbol's private characters in the TeX file.

Final note: in the info file, you will probably not see any 8bit characters.

You might want change the conversion between characters and tokens in language tex by changing:

x-symbol-tex-user-table
You can define you own tokens for X-Symbol characters. E.g., if you like to have the command \sqrt represented by a character (shadowing the entry for \surd), add the following to your `~/.emacs':

 
(setq x-symbol-tex-user-table '((radical (math special) "\\sqrt")))


6.2.5 Extra Symbols of Language "TeX Macro"

This section describes what you should put into your private style file or your document if you want to use extra symbols, i.e., characters whose info in the echo area (see section 5.3 Info in Echo Area) contains s.th. like `package.sty' or `user'. If you do not use the corresponding characters, you do not have to do anything, of course.

The TeX macros \Box, \Diamond, \leadsto, \Join, \lhd, \mho, \rhd, \sqsupset, \sqsubset, \unlhd, \unrhd, are defined in LaTeX package `latexsym.sty':

 
\usepackage{latexsym}

Note that these macros are also defined `amssymb.sty'. Since the first four macros are defined differently (better) in `latexsym.sty', it does make sense to load both LaTeX packages (e.g., `amssymb.sty' simply defines \Diamond to be the same as \lozenge).

The TeX macros \boldsymbol, \circledast, \circledcirc, \circleddash, \digamma, \gtrapprox, \gtrsim, \lessapprox, \lesssim, \triangleq, \varkappa are defined in AMS LaTeX package `amssymb.sty':

 
\usepackage{amssymb}

The TeX macros \bigsqcap, \llbracket, \rrbracket, \llparenthesis, \rrparenthesis are defined in the LaTeX package `stmaryrd.sty':

 
\usepackage{stmaryrd}

The TeX macros \guilsinglleft, \guilsinglright, \dj, \NG, \ng, \DH, \DJ, \dh, \dj, \TH, \th, \guillemotleft, \guillemotright and the ogonek characters are only defined if you use T1 font encoding:

 
\usepackage[T1]{fontenc}

The TeX macro \mathmicro for U00B5 can be defined by (see section 9.2.4 Wishlist: Changes in LaTeX):

 
\let\mathmicro\mu

You should define the following in your LaTeX file (if you use the corresponding characters), the first can only be used with T1 font encoding.

 
\DeclareTextSymbol{\textbackslash}{T1}{92}
\newcommand{\nsubset}{\not\subset}
\newcommand{\textflorin}{\textit{f}}
\newcommand{\setB}{{\mathord{\mathbb B}}}
\newcommand{\setC}{{\mathord{\mathbb C}}}
\newcommand{\setN}{{\mathord{\mathbb N}}}
\newcommand{\setQ}{{\mathord{\mathbb Q}}}
\newcommand{\setR}{{\mathord{\mathbb R}}}
\newcommand{\setZ}{{\mathord{\mathbb Z}}}
\newcommand{\coloncolon}{\mathrel{::}}

The TeX macros \textordfeminine, \textordmasculine, \textdegree, \textonequarter, \textonehalf, \textthreequarters, \mathonesuperior, \mathtwosuperior, \maththreesuperior, \textcopyright are only defined when using LaTeX package `inputenc.sty':

 
\usepackage[latin1]{inputenc}

The TeX macros \textcent, \textcurrency, \textyen, \textbrokenbar, \textmalteseH, \textmalteseh are defined as not available in LaTeX package `inputenc.sty'. See section 9.2.4 Wishlist: Changes in LaTeX. If you use this package and want to define these commands, use \renewcommand (or \def) after, e.g.:

 
\usepackage[latin1]{inputenc}
\usepackage{wasysym}  %% defines \cent, \currency, \brokenvert
\usepackage{amssymb}  %% defines \yen
\renewcommand{\textcent}{\cent}
\renewcommand{\textcurrency}{\currency}
\renewcommand{\textyen}{\yen}
\renewcommand{\textbrokenbar}{brokenvert}


6.3 Token Language "SGML entity" (sgml)

For buffers using the major mode html-mode, hm--html-mode, html-helper-mode, sgml-mode or xml-mode, we use token language SGML entity (sgml). This language provides the display of super-/subscripts and images. If the buffer visits a file and uses a HTML mode, X-Symbol mode is automatically turned on.

6.3.1 Basics of Language "SGML entity"  
6.3.2 Super-/Subscripts and Images in HTML  
6.3.3 The Conversion of SGML Entities  How the conversion of SGML entities works.


6.3.1 Basics of Language "SGML entity"

The standard behavior can be controlled by the following variables:

x-symbol-sgml-modes
x-symbol-sgml-auto-style
The variables known from 3.3 Minor Mode. If the buffer uses a HTML mode, super-/subscripts and images are displayed, otherwise unique decoding (see section 3.2.4 Unique Decoding) will be used.

x-symbol-sgml-auto-coding-alist
Used there to automatically deduce the specific encoding of the file (see section 3.2.2 File Coding of 8bit Characters). It searches for the following string in the current buffer, including the comment:

 
<meta http-equiv="content-type"
      content="text/html; charset=encoding">

where encoding should be one of `iso-8859-1', `iso-8859-2', `iso-8859-3', `iso-8859-9', or `iso-8859-15'. 8bit characters are not encoded if the file if the search was successful (see section 3.2.3 Store or Encode 8bit Characters).

The input methods and the character info in the echo area are controlled by:

x-symbol-sgml-header-groups-alist
Defines the headers and their characters for the language specific Grid and Menu.

x-symbol-sgml-extra-menu-items
There are no special entries in the X-Symbol menu.

x-symbol-sgml-electric-ignore
There is no additional constraint to the ones mentioned in 4.8 Input Method Electric: Automatic Context.

x-symbol-sgml-class-alist
x-symbol-sgml-class-face-alist
Token classes (see section 3.6 Character Group and Token Classes) are only used to define a coloring scheme. X-Symbol uses dark orange or dark red for non-Latin-1 characters in the Grid (see section 4.5 Input Method Grid: Choose Highlighted Character and the character info (see section 5.3 Info in Echo Area), dark red for characters without defined entity names in HTML (see section 6.3.3 The Conversion of SGML Entities).


6.3.2 Super-/Subscripts and Images in HTML

The display of super- and subscripts (see section 5.1 Super- and Subscripts) is controlled by:

x-symbol-sgml-font-lock-regexp
x-symbol-sgml-font-lock-limit-regexp
x-symbol-sgml-font-lock-alist
x-symbol-sgml-font-lock-contents-regexp
The superscript command <sup>...</sup> and the subscript command <sub>...</sub> is recognized. The contents should contain at least one character which is not a space or a nobreakspace.

The display of images (see section 5.2 Images at the end of Image Insertion Commands) is controlled by:

x-symbol-sgml-image-keywords
The following commands are recognized. Extension ext stands for `gif', `png', `jpeg' or `jpg'.

 
<img ... src="file.ext" ...>

x-symbol-sgml-master-directory
x-symbol-sgml-image-searchpath
Relative file names (see section 5.2.1 Display of Images) are relative to the directory of the current file.

x-symbol-sgml-image-file-truename-alist
The file name prefix `file:' is ignored. For any other file name which starts with letters and then a colon, e.g., with `http:' or `C:\' (which is no URL anyway), the image insertion command will be skipped. By changing this variable, you could specify that the prefix `http://www.fmi.uni-passau.de/~wedler/' corresponds to `~/public_html/'.

x-symbol-sgml-image-cached-dirs
The file name in the image command should not have a directory part or the directory part should be `images/' or `pictures/' if the image should be cached in the memory cache.


6.3.3 The Conversion of SGML Entities

Most character entities of HTML-4.0 are supported, except the following: uppercase Greek which look like uppercase Latin, "markup-significant and internationalization" characters, and some quotes. See http://www.w3.org/TR/REC-html40/sgml/entities.html.

By default, we encode to entity references like &amp;, and decode from both entity references and character references like &#38;. For Latin-N characters without defined entity names in HTML (e.g. scedilla), we can only use character references.

Do not expect Netscape before v6 to display non-Latin-1 characters correctly (this might work by specifying the charset UTF-8 and using character references).

You might want change the conversion between characters and tokens in language sgml by changing:

x-symbol-sgml-token-list
A symbol, which defines whether to use entity references, character references, or entity references for Latin-1 characters and character references for others.

x-symbol-sgml-user-table
It is probably not a good idea to change the defined tokens (except via the variable above), but you might want to add some definitions:

 
(setq x-symbol-sgml-user-table '((circ () 999 "&bcomp;")))


6.4 Token Language "BibTeX macro" (bib)

For buffers using the major mode bibtex-mode, we use token language BibTeX macro (bib). This language does not provide the display of super-/subscripts and images. If the buffer visits a file, X-Symbol mode is automatically turned on. It is controlled by:

x-symbol-bib-modes
x-symbol-bib-auto-style
The variables known from 3.3 Minor Mode. There is no automatic deduction of the file encoding, 8bit characters are usually encoded, and there is usually no unique decoding. See section 3.2 Conversion: Decoding and Encoding.

The major difference between this language and the token language tex is that the tokens for text-mode characters are most likely enclosed by braces. This has some problems (see section 6.2.3 Problems with TeX Macros), but is required by the program bibtex.

The input methods and most features except super-/subscripts and images work like in token language tex (see section 6.2 Token Language "TeX macro" (tex)):

x-symbol-bib-header-groups-alist
x-symbol-bib-electric-ignore
x-symbol-bib-class-alist
x-symbol-bib-class-face-alist
Like in 6.2.2 Super-/Subscripts and Images in LaTeX.

x-symbol-bib-extra-menu-items
There are no special entries in the X-Symbol menu.

You might want change the conversion between characters and tokens in language bib by changing:

x-symbol-bib-user-table
x-symbol-tex-user-table
Use the former for bib-only changes, the latter also influences the conversion with token language tex.


6.5 Token Language "TeXinfo command" (texi)

For buffers using the major mode texinfo-mode, we use token language TeXinfo command (texi). This language does not provide the display of super-/subscripts and images. If the buffer visits a file, X-Symbol mode is automatically turned on. It is controlled by:

x-symbol-texi-modes
x-symbol-texi-auto-style
The variables known from 3.3 Minor Mode. There is no automatic deduction of the file encoding, 8bit characters are usually encoded, and there is usually no unique decoding. See section 3.2 Conversion: Decoding and Encoding.

With x-symbol-8bits having value nil (the default), it might still happen that the saved file contains 8bit characters, since token language texi does not define tokens for all characters in the Latin charsets supported by X-Symbol. See section 3.2.3 Store or Encode 8bit Characters.

With x-symbol-unique having value nil (the default), we have unique decoding anyway, since token language texi does only define one token per character, i.e., the value is not important if x-symbol-8bits is nil. See section 3.2.4 Unique Decoding.

The input methods and the character info in the echo area are controlled by:

x-symbol-texi-header-groups-alist
Defines the headers and their characters for the language specific Grid and Menu.

x-symbol-texi-extra-menu-items
There are no special entries in the X-Symbol menu.

x-symbol-texi-electric-ignore
There is no additional constraint to the ones mentioned in 4.8 Input Method Electric: Automatic Context.

x-symbol-texi-class-alist
x-symbol-texi-class-face-alist
Only a few token classes (see section 3.6 Character Group and Token Classes) are defined, the most interesting induces the character info (see section 5.3 Info in Echo Area) to display `not as code' for @minus{} (@minus{} should not used inside @code and @example). No coloring scheme is defined.

At least with makeinfo-4.0, you do not get accented characters in the info file for the corresponding TeXinfo commands in the `.texi' file, the HTML output might contain illegal "SGML entities" like &140;.

At least with texi2html-1.62, you see accented characters in the HTML output for the corresponding TeXinfo commands in the `.texi' file, but the output might also contain illegal "SGML entities" like &140;.

You might want change the conversion between characters and tokens in language texi by changing:

x-symbol-texi-user-table
Extra entries for the conversion.


6.6 Languages Defined in Other Emacs Packages

It is no problem for other Emacs packages to define their own token language (see section 7.4 Extending Package X-Symbol).

I know of the following package--please check its manual for details.



This document was generated by Christoph Wedler on December, 8 2003 using texi2html