X-Symbol Overview Related Details Manual News »Project »Download

3. Concepts of Package X-Symbol

This chapter describes the concepts of package X-Symbol. It contains quite a few forward references to feature which are based on these concepts, such as 4. X-Symbol's Input Methods, and 5. Features of Package X-Symbol.

3.1 Token Language  What does a X-Symbol character represent.
3.2 Conversion: Decoding and Encoding  Decoding tokens, encoding characters.
3.3 Minor Mode  How to control the behavior of X-Symbol.
3.4 Poor Man's Mule: Running Under XEmacs/no-Mule  Running X-Symbol under XEmacs/no-Mule.
3.5 The Role of font-lock  Why does X-Symbol need font-lock.
3.6 Character Group and Token Classes  Character group and token classes.


3.1 Token Language

As mentioned in the overview, "X-Symbol Characters" in the buffer are represented by "tokens" in the file. The correspondence between these is determined by the token language which is in close relation to the major mode of the current buffer. E.g., character alpha stands for \alpha in LaTeX buffers.

For details of predefined token languages "TeX macro" (tex), "SGML entity" (sgml), "BibTeX macro" (bib), and "TeXinfo command" (texi), see 6. Supported Token Languages.

The token language determines the conversion between X-Symbol characters and tokens (see section 3.2 Conversion: Decoding and Encoding), the input methods (see section 4. X-Symbol's Input Methods), and various other features (see section 5. Features of Package X-Symbol).

The token language is defined by the following buffer-local variable:

x-symbol-language
Token language used in current buffer. You can set this variable in the "local variables list" near the end of the file (see section `File Variables' in XEmacs User's Manual), e.g.:

 
%% Local Variables:
%% x-symbol-language: tex
%% End:

Package X-Symbol uses a reasonable value according to the major mode and the file name of a buffer if the variable is not already buffer-local. A valid token language is required to turn on X-Symbol Minor mode, see 3.3 Minor Mode.

A token language must be registered, if you want to use it. By default, the above mentioned token languages are registered.


3.2 Conversion: Decoding and Encoding

As mentioned, X-Symbol characters in the buffer are represented by tokens in the file. Thus, we need some conversion from tokens to characters, called decoding, and some conversion from characters to tokens, called encoding.

We have the additional problem that some characters are not only represented by tokens, but also via some 8bit character encoding.

Package X-Symbol supports the following 8bit character encodings: Latin-1 (iso-8859-1), Latin-2 (iso-8859-2), Latin-3 (iso-8859-3), Latin-5 (iso-8859-9), and Latin-9 (iso-8859-15). It currently supports less encodings with XEmacs on Windows (see section 2.1 Requirements).

3.2.1 Normal File and Default Encoding  
3.2.2 File Coding of 8bit Characters  Specific encoding of a file.
3.2.3 Store or Encode 8bit Characters  Do you want to store 8bit characters?
3.2.4 Unique Decoding  Restrict decoding to avoid normalization?
3.2.5 Conversion Commands  Interactive encoding and decoding.
3.2.6 Copy & Paste with Conversion  Copy & paste with conversion.
3.2.7 Character Aliases  Different charsets include the same chars.


3.2.1 Normal File and Default Encoding

As mentioned, some characters have a 8bit file encoding, and X-Symbol needs to know which 8bit file encoding you use normally when visiting a file and saving a buffer.

With Mule support, Emacs/XEmacs can recognize the normal file encoding, also called a coding system (see section `Recognize Coding' in XEmacs User's Manual).

Without Mule support, XEmacs can usually only support 8bit characters of one encoding; this encoding corresponds to the charset/registry of your default font. Here, the normal file encoding is the default encoding:

x-symbol-default-coding
The default encoding. The value must be a symbol denoting one of the supported encodings or nil. The variable must be set before X-Symbol has been initialized. See section 2.4 Make XEmacs Initialize X-Symbol During Startup.

The default encoding is not only used to determine the normal file encoding without Mule, but also for the following:

To deduce the default value, X-Symbol inspects the Mule language environment and the output of the shell command locale, or to be more exact:

 
locale -ck code_set_name charmap

Without Mule support, you get a warning if the command does not exist on your system or lists an encoding which is not supported by X-Symbol, such as some Asian encoding. Value nil is the same as iso-8859-1.

With Mule support, you get a warning if the command lists a supported encoding which is different from the encoding deduced from the Mule language environment. Value nil makes sure that X-Symbol file encoding detection (see section 3.2.2 File Coding of 8bit Characters) only works if Emacs has detected the same encoding; it works like iso-8859-1 otherwise.


3.2.2 File Coding of 8bit Characters

X-Symbol can use a different encoding for single buffers/files, even if you use X-Symbol on XEmacs without Mule support. To do so, set the following buffer-local variable:

x-symbol-coding
8bit character encoding in the file visited by the current buffer. Value nil represents the normal file encoding (see section 3.2.1 Normal File and Default Encoding).

With Mule support, any value other than nil is considered invalid if the normal file encoding is neither the same as this value nor the same as the default encoding. I.e., if your default encoding is nil, X-Symbol's file encoding detection never takes precedence over Emacs' one, i.e., the normal file encoding.

You can set this variable in the "local variables list" near the end of the file (see section `File Variables' in XEmacs User's Manual), e.g.:

 
<!-- Local Variables: -->
<!-- x-symbol-coding: iso-8859-2 -->
<!-- End: -->

If the variable is not already buffer-local, a reasonable value is deduced when turning on X-Symbol (see section 3.3 Minor Mode) by searching for some language dependent headers at the beginning of the file:

x-symbol-auto-coding-search-limit
X-Symbol usually searches for something like `\usepackage[...]{inputenc}' (see section 6.2 Token Language "TeX macro" (tex)) or `<meta ... charset=...>' (see section 6.3 Token Language "SGML entity" (sgml)) in the first 10000 characters.

If you choose not to save a file containing 8bit characters (see section 3.2.3 Store or Encode 8bit Characters), the file encoding is still important, since the file might contain 8bit characters when you visit it.

If the file encoding is different to the normal file encoding, X-Symbol performs the necessary recoding itself. Recoding changes a character with code position pos in one charset to a character with the same code position pos in another charset. If the normal file encoding is different to the default encoding, X-Symbol also resolves character aliases (see section 3.2.7 Character Aliases).

If you have specified an invalid file encoding (including an encoding different to a non-default normal file encoding), we have the following cases:

We end with a little example: if your normal file encoding and default encoding is Latin-1, and you visit a file with `\usepackage[latin9]{inputenc}' producing some document containing the Euro sign, you see the Euro character in Emacs when X-Symbol is enabled, but you see the currency character without X-Symbol.


3.2.3 Store or Encode 8bit Characters

You can specify that 8bit characters (according to the coding in your file, see 3.2.2 File Coding of 8bit Characters), are not encoded to tokens (when saving a file), by setting the following buffer-local variable:

x-symbol-8bits
Whether to store 8bit characters when saving the current buffer.

You can set this variable in the "local variables list" near the end of the file (see section `File Variables' in XEmacs User's Manual), e.g.:

 
%% Local Variables:
%% x-symbol-8bits: t
%% End:

If the variable is not already buffer-local, a reasonable value is deduced when turning on X-Symbol (see section 3.3 Minor Mode) by setting it the the value of x-symbol-coding, or searching in the file for 8bit characters:

x-symbol-auto-8bit-search-limit
If there is a 8bit character in the file when visiting it, X-Symbol will also store 8bit characters when saving the buffer.

If the file encoding is invalid (see section 3.2.2 File Coding of 8bit Characters), we always search for 8bit characters in the complete document and set x-symbol-8bits accordingly. Then, a non-nil value also implies unique decoding (see section 3.2.4 Unique Decoding).

While the variable x-symbol-8bits usually only influences the encoding, it also influences the decoding if you choose to decode uniquely (see section 3.2.4 Unique Decoding).

Setting variable x-symbol-8bits to nil does not necessarily mean that the file will not contain 8bit characters: the characters might have no token representation in the current token language (see section 6.5 Token Language "TeXinfo command" (texi)), or they are glyphs for ununsed code points in the Latin-3 charset. In both cases, it is unlikely that you have inserted these invalid characters via X-Symbol's input methods (see section 4.1 Common Behavior of All Input Methods), you have probably copied them into the current buffer.


3.2.4 Unique Decoding

Token languages might define more than one token representing the same character. When decoding and encoding these tokens, they will be normalized to one form, the canonical representation. E.g., with language tex, visiting a file with tokens \neq and \ne converts both tokens to character lessequal, saving the buffer stores the character as token \neq in both occurrences.

It can also happen that a file contains both a 8bit character and a token which would be converted to exactly that character. When saving the file, both characters are either not encoded, or both are encoded to the same token.

Normally, this is no problem. But if you redefine standard TeX macros, it certainly could be the case (see section 6.2.3 Problems with TeX Macros)! For this reason, package X-Symbol provides the following buffer-local variable:

x-symbol-unique
Whether to limit the decoding in such a way that no normalization will happen. That means: only decode canonical tokens, and, if x-symbol-8bits is non-nil (see section 3.2.3 Store or Encode 8bit Characters), do not decode tokens which would be decoded to 8bit characters (according to the coding in your file, see 3.2.2 File Coding of 8bit Characters).

You can set this variable in the "local variables list" near the end of the file (see section `File Variables' in XEmacs User's Manual), e.g., together with a setting for x-symbol-8bits:

 
%% Local Variables:
%% x-symbol-8bits: t
%% x-symbol-unique: t
%% End:

If the variable is not already buffer-local, a reasonable value is deduced when turning on X-Symbol (see section 3.3 Minor Mode): it will be set to t if X-Symbol mode is not automatically turned on.

If the file encoding is invalid (see section 3.2.2 File Coding of 8bit Characters) and x-symbol-8bits is non-nil (see section 3.2.3 Store or Encode 8bit Characters), X-Symbol always uses unique decoding (see section 3.2.4 Unique Decoding).


3.2.5 Conversion Commands

First the good news: most of the time, the necessary conversions are performed automatically when you would expect them to be performed:

Nevertheless, you might want to perform the conversions explicitly in some situations by using one of the following commands (also to be found in the menu):

M-x x-symbol-decode-recode
Recode all characters (if necessary) and decode all tokens to characters.

M-x x-symbol-decode
Decode all tokens to characters, do not recode characters.

M-x x-symbol-encode-recode
Encode all characters in buffer to tokens or recode them.

M-x x-symbol-encode
Encode all characters in buffer to tokens. No recoding will be performed since 8bit characters will always be encoded if the file coding is different to the default coding, since x-symbol-8bits is relative to the file coding, see 3.2.3 Store or Encode 8bit Characters.

All commands work on the region if it is active, or the (narrowed part of the) buffer if no region is active.

If the file coding is the same as the default coding, the variants with and without recoding (see section 3.2.2 File Coding of 8bit Characters) do the same. The variants with recodings are the ones used when doing the conversion automatically. The variants without recodings are the ones used when using the special Copy & Paste commands presented in the next subsection.


3.2.6 Copy & Paste with Conversion

You probably use X-Symbol, because you want to produce some non-ASCII characters in your final document, but you are not really interested what kind of token you would need to write. (After all, you do not use a hex editor to produce documents using some non-ASCII encoding in the file, since you are not interested in the byte sequence of individual characters.)

Consequently, all editing operations really work on characters, not on the corresponding tokens for the token language of the current buffer. This includes copying and pasting: if you copy the character plusminus from a LaTeX buffer to a HTML buffer, you really copy that character and not the three characters of the TeX macro \pm.

If you copy text to a buffer where X-Symbol is not enabled, like a mail buffer, that is probably not what you want. Similarly, you would probably like to see the X-Symbol characters for tokens in a text which you have copied from such a buffer. Therefore, X-Symbol provides the following commands (also to be found in the menu):

M-x x-symbol-copy-region-encoded
Save the region in the kill-ring with all X-Symbol characters encoded like by M-x x-symbol-encode, i.e., without recoding.

M-x x-symbol-yank-decoded
Reinsert the last text in the kill-ring and decode the inserted text like M-x x-symbol-decode, i.e., without recoding.

You could get the same result with the usual copy & paste commands and the conversion commands from the previous section (see section 3.2.5 Conversion Commands), but this would clutter the undo information of the current buffer and would require an additional undo operation for the copy.


3.2.7 Character Aliases

A character alias or char alias is a character which is also a character in a font with another registry, e.g., adiaeresis is defined in all supported Latin fonts. Emacs distinguish between these five characters. In package X-Symbol, one of them, with x-symbol-default-coding (see section 3.2.1 Normal File and Default Encoding if possible, is supported by the input methods, the other ones are char aliases to the supported one.

The reason is that it would be confusing for the user to choose among different adiaeresises and that there are neither different adiaeresises in Unicode nor in the token representations of languages tex and sgml.

8bit characters in files with a file coding x-symbol-coding other than x-symbol-default-coding are converted to the "normal" form. E.g., if you have a Latin-1 font by default, the adiaeresis in a Latin-2 encoded file is a Latin-1 adiaeresis in the buffer. When saving the buffer, its is again the right 8bit character in the Latin-2 encoded file.

Thus, in normal cases, buffers do not have char aliases. In Emacs with Mule support, this is only possible if you copy characters from buffers with characters considered as char aliases by package X-Symbol, e.g., from the Mule file `european.el'. In XEmacs without Mule support, this is only possible if you use commands like C-q 2 3 4.

If you have char aliases in the current buffer, you might want to use (it is not really necessary, just when searching for characters):

M-x x-symbol-unalias
Resolve all character aliases in buffer. If the region is active, only resolve char aliases in the region.

A single char alias before point can be resolved by command x-symbol-modify-key and x-symbol-rotate-key, see 4.7 Input Method Context: Replace Char Sequence.

The XEmacs package latin-unity provides a command to "remap" characters to one character set (if possible). X-Symbol's unaliasing can be seen as remap operations to a fixed sequence of character sets.


3.3 Minor Mode

X-Symbol is a minor mode (see section `Minor Modes' in XEmacs User's Manual) which enables the features mentioned in this manual:

With the default installation, X-Symbol mode is automatically turned on when it is appropriate to do so (see below for details). You can control it for individually by the following command:

M-x x-symbol-mode
Toggle X-Symbol mode. If provided with a prefix argument, turn X-Symbol mode on if the numeric value of the argument is positive, else turn it off. If no token language can be deduced, ask for a token language; if provided with a non-numeric prefix argument (C-u M-x x-symbol-mode), always ask.

By default, X-Symbol mode is disabled in special major-modes visiting a file, e.g., vm-mode (see section 8.4.12 How to Use X-Symbol with Gnus or VM). Use a prefix argument to be asked whether to turn in on anyway.

Turning X-Symbol mode on requires that you have a valid token language for the current buffer. Since turning X-Symbol mode on also decodes tokens, it is also useful to set the variables which control the conversion (see section 3.2 Conversion: Decoding and Encoding).

Since people usually do not want to write some Emacs Lisp functions to do some customizations, X-Symbol provides the following variables which induce X-Symbol to set the necessary buffer-local variables when X-Symbol is turned on:

x-symbol-auto-style-alist
You can use the major mode and/or the name of the buffer or visited file, and specific functions to set the following variables (if not already buffer-local):

x-symbol-lang-modes
Major modes which use token language lang by default. See section 6. Supported Token Languages. The languages are checked in registration order (the order shown in the language selection submenus).

x-symbol-lang-auto-style
Default values for the above mentioned variables x-symbol-mode, x-symbol-coding, x-symbol-8bits, x-symbol-unique, x-symbol-subscripts, and x-symbol-image if not already buffer-local.

x-symbol-auto-mode-suffixes
Regular expression matching file suffixes to be ignored when checking file names for the derivation above, e.g., extension `.orig'.

x-symbol-modeline-state-list
This variable controls the modeline appearance just mentioned.

The menu might also include individual entries for a token language (see section 6.2.1 Basics of Language "TeX macro"):

x-symbol-lang-extra-menu-items
Extra menu items for each token language lang (see section 6.2.1 Basics of Language "TeX macro").


3.4 Poor Man's Mule: Running Under XEmacs/no-Mule

Using XEmacs/no-Mule normally means that you are restricted to use not more than 256 different characters in your documents.

Package X-Symbol provides a lot more characters which can also be used with XEmacs/no-Mule. Internally, all X-Symbol characters except the ones of your default font (see section 3.2.1 Normal File and Default Encoding) are represented by two characters, see 7.1 Internal Representation of X-Symbol Characters.

This can lead to a lot of problems, which are resolved by the following methods (some annoyances remain, see section 8.1 Problems under XEmacs/no-Mule) when X-Symbol mode is turned on (see section 3.3 Minor Mode):


3.5 The Role of font-lock

Package X-Symbol uses package font-lock to display super- and subscripts (see section 5.1 Super- and Subscripts) and to display its special characters under XEmacs/no-Mule (see section 3.4 Poor Man's Mule: Running Under XEmacs/no-Mule). Thus, you should enable font-lock in buffers where you want to use X-Symbol (it is by default). See section 2.6.2 Syntax Highlighting Packages (font-lock and add-ons).

When X-Symbol mode is turned on, it automatically adds the necessary font-lock keywords to the buffer-local value of font-lock-keywords and all font-lock keywords which are commonly used with the current token language.

Setting all font-lock keywords is important since font-lock might not yet been turned on or since you might want to change font-locks decoration of the current buffer after X-Symbol has been turned on.

Please note that switching the mode by typing M-x latex-mode does not set the LaTeX's font-lock keywords! They are set at the end of C-x C-f. If you switch the mode, turn on font-lock by yourself.

Independently from package X-Symbol, the following command might be useful in some situations:

M-x x-symbol-fontify
Refontify buffer.


3.6 Character Group and Token Classes

Each X-Symbol character belongs to a character group, e.g., natnums belongs to setsymbol. A character group should consists of similar characters where "similar" means similar meaning, not similar appearance. Two characters which have nearly the same appearance, should be in the same group, though. The group determines:

The character group is independent from any token language, but is probably somewhat related to some of its token classes. For each token language, each character is assigned to a list of token classes, which can be used for the following:

The token classes for individual token languages are explained in the corresponding sections of 6. Supported Token Languages:

x-symbol-lang-header-groups-alist
The Grid and Menu headers for each token language lang.

x-symbol-lang-class-alist
Strings for the character info in the echo area for each token language lang.

x-symbol-lang-class-face-alist
The coloring scheme for each token language lang.



This document was generated by Christoph Wedler on December, 8 2003 using texi2html