3. Concepts of Package X-Symbol
This chapter describes the concepts of package X-Symbol. It contains quite a few forward references to feature which are based on these concepts, such as 4. X-Symbol's Input Methods, and 5. Features of Package X-Symbol.
3.1 Token Language What does a X-Symbol character represent. 3.2 Conversion: Decoding and Encoding Decoding tokens, encoding characters. 3.3 Minor Mode How to control the behavior of X-Symbol. 3.4 Poor Man's Mule: Running Under XEmacs/no-Mule Running X-Symbol under XEmacs/no-Mule. 3.5 The Role of font-lock
Why does X-Symbol need font-lock
.3.6 Character Group and Token Classes Character group and token classes.
3.1 Token Language
As mentioned in the overview, "X-Symbol Characters" in the buffer are
represented by "tokens" in the file. The correspondence between these
is determined by the token language which is in close relation to
the major mode of the current buffer. E.g., character alpha
stands for \alpha
in LaTeX buffers.
For details of predefined token languages "TeX macro" (tex
),
"SGML entity" (sgml
), "BibTeX macro" (bib
), and
"TeXinfo command" (texi
), see 6. Supported Token Languages.
The token language determines the conversion between X-Symbol characters and tokens (see section 3.2 Conversion: Decoding and Encoding), the input methods (see section 4. X-Symbol's Input Methods), and various other features (see section 5. Features of Package X-Symbol).
The token language is defined by the following buffer-local variable:
x-symbol-language
-
Token language used in current buffer. You can set this variable in the
"local variables list" near the end of the file (see section `File Variables' in XEmacs User's Manual), e.g.:
%% Local Variables: %% x-symbol-language: tex %% End:
Package X-Symbol uses a reasonable value according to the major mode and the file name of a buffer if the variable is not already buffer-local. A valid token language is required to turn on X-Symbol Minor mode, see 3.3 Minor Mode.
A token language must be registered, if you want to use it. By default, the above mentioned token languages are registered.
3.2 Conversion: Decoding and Encoding
As mentioned, X-Symbol characters in the buffer are represented by tokens in the file. Thus, we need some conversion from tokens to characters, called decoding, and some conversion from characters to tokens, called encoding.
We have the additional problem that some characters are not only represented by tokens, but also via some 8bit character encoding.
Package X-Symbol supports the following 8bit character encodings:
Latin-1 (iso-8859-1
), Latin-2 (iso-8859-2
), Latin-3
(iso-8859-3
), Latin-5 (iso-8859-9
), and Latin-9
(iso-8859-15
). It currently supports less encodings with XEmacs
on Windows (see section 2.1 Requirements).
3.2.1 Normal File and Default Encoding 3.2.2 File Coding of 8bit Characters Specific encoding of a file. 3.2.3 Store or Encode 8bit Characters Do you want to store 8bit characters? 3.2.4 Unique Decoding Restrict decoding to avoid normalization? 3.2.5 Conversion Commands Interactive encoding and decoding. 3.2.6 Copy & Paste with Conversion Copy & paste with conversion. 3.2.7 Character Aliases Different charsets include the same chars.
3.2.1 Normal File and Default Encoding
As mentioned, some characters have a 8bit file encoding, and X-Symbol needs to know which 8bit file encoding you use normally when visiting a file and saving a buffer.
With Mule support, Emacs/XEmacs can recognize the normal file encoding, also called a coding system (see section `Recognize Coding' in XEmacs User's Manual).
Without Mule support, XEmacs can usually only support 8bit characters of one encoding; this encoding corresponds to the charset/registry of your default font. Here, the normal file encoding is the default encoding:
x-symbol-default-coding
-
The default encoding. The value must be a symbol denoting one of the
supported encodings or
nil
. The variable must be set before X-Symbol has been initialized. See section 2.4 Make XEmacs Initialize X-Symbol During Startup.
The default encoding is not only used to determine the normal file encoding without Mule, but also for the following:
-
X-Symbol has its own mechanism to recognize a file encoding which only
works with a specified default encoding. See section 3.2.2 File Coding of 8bit Characters.
-
The same character can be included in various Latin charsets and
X-Symbol needs to know which of the instances (which Emacs views as
different characters) to support. See section 3.2.7 Character Aliases.
- Without Mule support, the default encoding is also needed to decide which characters have to be faked by 2 characters internally: exactly the characters in those charsets which do not correspond to the default encoding. See section 3.4 Poor Man's Mule: Running Under XEmacs/no-Mule.
To deduce the default value, X-Symbol inspects the Mule language
environment and the output of the shell command locale
, or to be
more exact:
locale -ck code_set_name charmap |
Without Mule support, you get a warning if the command does not exist on
your system or lists an encoding which is not supported by X-Symbol,
such as some Asian encoding. Value nil
is the same as
iso-8859-1
.
With Mule support, you get a warning if the command lists a supported
encoding which is different from the encoding deduced from the Mule
language environment. Value nil
makes sure that X-Symbol file
encoding detection (see section 3.2.2 File Coding of 8bit Characters) only works if Emacs has
detected the same encoding; it works like iso-8859-1
otherwise.
3.2.2 File Coding of 8bit Characters
X-Symbol can use a different encoding for single buffers/files, even if you use X-Symbol on XEmacs without Mule support. To do so, set the following buffer-local variable:
x-symbol-coding
-
8bit character encoding in the file visited by the current buffer.
Value
nil
represents the normal file encoding (see section 3.2.1 Normal File and Default Encoding).With Mule support, any value other than
nil
is considered invalid if the normal file encoding is neither the same as this value nor the same as the default encoding. I.e., if your default encoding isnil
, X-Symbol's file encoding detection never takes precedence over Emacs' one, i.e., the normal file encoding.You can set this variable in the "local variables list" near the end of the file (see section `File Variables' in XEmacs User's Manual), e.g.:
<!-- Local Variables: --> <!-- x-symbol-coding: iso-8859-2 --> <!-- End: -->
If the variable is not already buffer-local, a reasonable value is deduced when turning on X-Symbol (see section 3.3 Minor Mode) by searching for some language dependent headers at the beginning of the file:
x-symbol-auto-coding-search-limit
-
X-Symbol usually searches for something like
`\usepackage[...]{inputenc}' (see section 6.2 Token Language "TeX macro" (
tex
)) or `<meta ... charset=...>' (see section 6.3 Token Language "SGML entity" (sgml
)) in the first 10000 characters.
If you choose not to save a file containing 8bit characters (see section 3.2.3 Store or Encode 8bit Characters), the file encoding is still important, since the file might contain 8bit characters when you visit it.
If the file encoding is different to the normal file encoding, X-Symbol performs the necessary recoding itself. Recoding changes a character with code position pos in one charset to a character with the same code position pos in another charset. If the normal file encoding is different to the default encoding, X-Symbol also resolves character aliases (see section 3.2.7 Character Aliases).
If you have specified an invalid file encoding (including an encoding different to a non-default normal file encoding), we have the following cases:
-
If the normal file encoding is unsupported (any file encoding is invalid
in this case) or if the normal file encoding is supported and the file
does not contain 8bit characters, we always encode all X-Symbol
character (see section 3.2.3 Store or Encode 8bit Characters). The modeline includes
`-i' to represent the file encoding (see section 3.3 Minor Mode), except if
the default encoding is
nil
, the normal file encoding is unsupported, and the variablex-symbol-coding
is not specified. - If the normal file encoding is supported and the file contains at least one 8bit character, X-Symbol does not touch 8bit characters and never produces them, neither via decoding (see section 3.2.4 Unique Decoding) nor via input methods. The modeline includes `-err' to represent the file encoding (see section 3.3 Minor Mode).
We end with a little example: if your normal file encoding and default encoding is Latin-1, and you visit a file with `\usepackage[latin9]{inputenc}' producing some document containing the Euro sign, you see the Euro character in Emacs when X-Symbol is enabled, but you see the currency character without X-Symbol.
3.2.3 Store or Encode 8bit Characters
You can specify that 8bit characters (according to the coding in your file, see 3.2.2 File Coding of 8bit Characters), are not encoded to tokens (when saving a file), by setting the following buffer-local variable:
x-symbol-8bits
-
Whether to store 8bit characters when saving the current buffer.
You can set this variable in the "local variables list" near the end of the file (see section `File Variables' in XEmacs User's Manual), e.g.:
%% Local Variables: %% x-symbol-8bits: t %% End:
If the variable is not already buffer-local, a reasonable value is
deduced when turning on X-Symbol (see section 3.3 Minor Mode) by setting it the
the value of x-symbol-coding
, or searching in the file for 8bit
characters:
x-symbol-auto-8bit-search-limit
- If there is a 8bit character in the file when visiting it, X-Symbol will also store 8bit characters when saving the buffer.
If the file encoding is invalid (see section 3.2.2 File Coding of 8bit Characters), we always search
for 8bit characters in the complete document and set
x-symbol-8bits
accordingly. Then, a non-nil
value also
implies unique decoding (see section 3.2.4 Unique Decoding).
While the variable x-symbol-8bits
usually only influences the
encoding, it also influences the decoding if you choose to decode
uniquely (see section 3.2.4 Unique Decoding).
Setting variable x-symbol-8bits
to nil
does not
necessarily mean that the file will not contain 8bit characters: the
characters might have no token representation in the current token
language (see section 6.5 Token Language "TeXinfo command" (texi
)), or they are glyphs for ununsed code
points in the Latin-3 charset. In both cases, it is unlikely that you
have inserted these invalid characters via X-Symbol's input methods
(see section 4.1 Common Behavior of All Input Methods), you have probably copied them into
the current buffer.
3.2.4 Unique Decoding
Token languages might define more than one token representing the same
character. When decoding and encoding these tokens, they will be
normalized to one form, the canonical representation. E.g.,
with language tex
, visiting a file with tokens \neq
and
\ne
converts both tokens to character lessequal
, saving
the buffer stores the character as token \neq
in both
occurrences.
It can also happen that a file contains both a 8bit character and a token which would be converted to exactly that character. When saving the file, both characters are either not encoded, or both are encoded to the same token.
Normally, this is no problem. But if you redefine standard TeX macros, it certainly could be the case (see section 6.2.3 Problems with TeX Macros)! For this reason, package X-Symbol provides the following buffer-local variable:
x-symbol-unique
-
Whether to limit the decoding in such a way that no normalization will
happen. That means: only decode canonical tokens, and, if
x-symbol-8bits
is non-nil
(see section 3.2.3 Store or Encode 8bit Characters), do not decode tokens which would be decoded to 8bit characters (according to the coding in your file, see 3.2.2 File Coding of 8bit Characters).You can set this variable in the "local variables list" near the end of the file (see section `File Variables' in XEmacs User's Manual), e.g., together with a setting for
x-symbol-8bits
:%% Local Variables: %% x-symbol-8bits: t %% x-symbol-unique: t %% End:
If the variable is not already buffer-local, a reasonable value is
deduced when turning on X-Symbol (see section 3.3 Minor Mode): it will be set to
t
if X-Symbol mode is not automatically turned on.
If the file encoding is invalid (see section 3.2.2 File Coding of 8bit Characters) and
x-symbol-8bits
is non-nil
(see section 3.2.3 Store or Encode 8bit Characters), X-Symbol always uses unique decoding (see section 3.2.4 Unique Decoding).
3.2.5 Conversion Commands
First the good news: most of the time, the necessary conversions are performed automatically when you would expect them to be performed:
-
Turning X-Symbol minor mode (see section 3.3 Minor Mode) on/off also performs
decoding/encoding.
-
Saving a buffer where X-Symbol is enabled will encode the characters to
tokens in the file (of course, you keep to have the characters in the
buffer).
- Inserting a file into a buffer where X-Symbol is enabled will decode the tokens in the inserted region.
Nevertheless, you might want to perform the conversions explicitly in some situations by using one of the following commands (also to be found in the menu):
- M-x x-symbol-decode-recode
-
Recode all characters (if necessary) and decode all tokens to
characters.
- M-x x-symbol-decode
-
Decode all tokens to characters, do not recode characters.
- M-x x-symbol-encode-recode
-
Encode all characters in buffer to tokens or recode them.
- M-x x-symbol-encode
-
Encode all characters in buffer to tokens. No recoding will be
performed since 8bit characters will always be encoded if the file
coding is different to the default coding, since
x-symbol-8bits
is relative to the file coding, see 3.2.3 Store or Encode 8bit Characters.
All commands work on the region if it is active, or the (narrowed part of the) buffer if no region is active.
If the file coding is the same as the default coding, the variants with and without recoding (see section 3.2.2 File Coding of 8bit Characters) do the same. The variants with recodings are the ones used when doing the conversion automatically. The variants without recodings are the ones used when using the special Copy & Paste commands presented in the next subsection.
3.2.6 Copy & Paste with Conversion
You probably use X-Symbol, because you want to produce some non-ASCII characters in your final document, but you are not really interested what kind of token you would need to write. (After all, you do not use a hex editor to produce documents using some non-ASCII encoding in the file, since you are not interested in the byte sequence of individual characters.)
Consequently, all editing operations really work on characters, not on
the corresponding tokens for the token language of the current buffer.
This includes copying and pasting: if you copy the character
plusminus
from a LaTeX buffer to a HTML buffer, you really copy
that character and not the three characters of the TeX macro \pm
.
If you copy text to a buffer where X-Symbol is not enabled, like a mail buffer, that is probably not what you want. Similarly, you would probably like to see the X-Symbol characters for tokens in a text which you have copied from such a buffer. Therefore, X-Symbol provides the following commands (also to be found in the menu):
- M-x x-symbol-copy-region-encoded
-
Save the region in the
kill-ring
with all X-Symbol characters encoded like by M-x x-symbol-encode, i.e., without recoding. - M-x x-symbol-yank-decoded
-
Reinsert the last text in the
kill-ring
and decode the inserted text like M-x x-symbol-decode, i.e., without recoding.
You could get the same result with the usual copy & paste commands and the conversion commands from the previous section (see section 3.2.5 Conversion Commands), but this would clutter the undo information of the current buffer and would require an additional undo operation for the copy.
3.2.7 Character Aliases
A character alias or char alias is a character which is also
a character in a font with another registry, e.g., adiaeresis
is
defined in all supported Latin fonts. Emacs distinguish between these
five characters. In package X-Symbol, one of them, with
x-symbol-default-coding
(see section 3.2.1 Normal File and Default Encoding if possible, is
supported by the input methods, the other ones are char aliases to the
supported one.
The reason is that it would be confusing for the user to choose among
different adiaeresis
es and that there are neither different
adiaeresis
es in Unicode nor in the token representations of
languages tex
and sgml
.
8bit characters in files with a file coding x-symbol-coding
other
than x-symbol-default-coding
are converted to the "normal"
form. E.g., if you have a Latin-1 font by default, the
adiaeresis
in a Latin-2 encoded file is a Latin-1
adiaeresis
in the buffer. When saving the buffer, its is again
the right 8bit character in the Latin-2 encoded file.
Thus, in normal cases, buffers do not have char aliases. In Emacs with Mule support, this is only possible if you copy characters from buffers with characters considered as char aliases by package X-Symbol, e.g., from the Mule file `european.el'. In XEmacs without Mule support, this is only possible if you use commands like C-q 2 3 4.
If you have char aliases in the current buffer, you might want to use (it is not really necessary, just when searching for characters):
- M-x x-symbol-unalias
- Resolve all character aliases in buffer. If the region is active, only resolve char aliases in the region.
A single char alias before point can be resolved by command
x-symbol-modify-key
and x-symbol-rotate-key
, see
4.7 Input Method Context: Replace Char Sequence.
The XEmacs package latin-unity
provides a command to "remap"
characters to one character set (if possible). X-Symbol's unaliasing
can be seen as remap operations to a fixed sequence of character sets.
3.3 Minor Mode
X-Symbol is a minor mode (see section `Minor Modes' in XEmacs User's Manual) which enables the features mentioned in this manual:
-
X-Symbol mode is required to do the conversions. Turning the minor mode
on/off also includes decoding/encoding (see section 3.2.5 Conversion Commands).
-
X-Symbol mode provides the minor mode menu which includes: various
commands, commands to insert characters (see section 4.4 Input Method Menu: Select a Menu Item), and
entries to change some global and buffer-local variables mentioned in
this manual.
- X-Symbol mode is required for most input methods (see section 4. X-Symbol's Input Methods) and other features (see section 5. Features of Package X-Symbol).
With the default installation, X-Symbol mode is automatically turned on when it is appropriate to do so (see below for details). You can control it for individually by the following command:
- M-x x-symbol-mode
-
Toggle X-Symbol mode. If provided with a prefix argument, turn X-Symbol
mode on if the numeric value of the argument is positive, else turn it
off. If no token language can be deduced, ask for a token language; if
provided with a non-numeric prefix argument (C-u M-x
x-symbol-mode), always ask.
By default, X-Symbol mode is disabled in special major-modes visiting a file, e.g.,
vm-mode
(see section 8.4.12 How to Use X-Symbol with Gnus or VM). Use a prefix argument to be asked whether to turn in on anyway.
Turning X-Symbol mode on requires that you have a valid token language for the current buffer. Since turning X-Symbol mode on also decodes tokens, it is also useful to set the variables which control the conversion (see section 3.2 Conversion: Decoding and Encoding).
Since people usually do not want to write some Emacs Lisp functions to do some customizations, X-Symbol provides the following variables which induce X-Symbol to set the necessary buffer-local variables when X-Symbol is turned on:
x-symbol-auto-style-alist
-
You can use the major mode and/or the name of the buffer or visited
file, and specific functions to set the following variables (if not
already buffer-local):
-
x-symbol-token-language
(see section 3.1 Token Language), indicated in the modeline, e.g. `tex', -
x-symbol-mode
, i.e., whether it is appropriate to turn on X-Symbol mode automatically, -
x-symbol-coding
(see section 3.2.2 File Coding of 8bit Characters), indicated in the modeline if different from the default coding, e.g. `-l2' for Latin-2, -
x-symbol-8bits
(see section 3.2.3 Store or Encode 8bit Characters), indicated in the modeline by `8', -
x-symbol-unique
(see section 3.2.4 Unique Decoding), indicated in the modeline by `*', -
x-symbol-subscripts
(see section 5.1 Super- and Subscripts), indicated in the modeline by `s', -
x-symbol-image
(see section 5.2 Images at the end of Image Insertion Commands), indicated in the modeline by `i',
-
x-symbol-lang-modes
- Major modes which use token language lang by default. See section 6. Supported Token Languages. The languages are checked in registration order (the order shown in the language selection submenus).
x-symbol-lang-auto-style
-
Default values for the above mentioned variables
x-symbol-mode
,x-symbol-coding
,x-symbol-8bits
,x-symbol-unique
,x-symbol-subscripts
, andx-symbol-image
if not already buffer-local. x-symbol-auto-mode-suffixes
- Regular expression matching file suffixes to be ignored when checking file names for the derivation above, e.g., extension `.orig'.
x-symbol-modeline-state-list
-
This variable controls the modeline appearance just mentioned.
The menu might also include individual entries for a token language (see section 6.2.1 Basics of Language "TeX macro"):
x-symbol-lang-extra-menu-items
- Extra menu items for each token language lang (see section 6.2.1 Basics of Language "TeX macro").
3.4 Poor Man's Mule: Running Under XEmacs/no-Mule
Using XEmacs/no-Mule normally means that you are restricted to use not more than 256 different characters in your documents.
Package X-Symbol provides a lot more characters which can also be used with XEmacs/no-Mule. Internally, all X-Symbol characters except the ones of your default font (see section 3.2.1 Normal File and Default Encoding) are represented by two characters, see 7.1 Internal Representation of X-Symbol Characters.
This can lead to a lot of problems, which are resolved by the following methods (some annoyances remain, see section 8.1 Problems under XEmacs/no-Mule) when X-Symbol mode is turned on (see section 3.3 Minor Mode):
-
After each editing command, i.e., point movement, deletion of text and
insertion of text, package X-Symbol checks whether just one of the two
internal characters of an X-Symbol character has been affected.
-
Package
font-lock
is used to display these two-character sequences with the correct fonts. The potential problem lies in the set-up of the corresponding font-lock keywords, see 3.5 The Role offont-lock
.
3.5 The Role of font-lock
Package X-Symbol uses package font-lock
to display super- and
subscripts (see section 5.1 Super- and Subscripts) and to display its special
characters under XEmacs/no-Mule (see section 3.4 Poor Man's Mule: Running Under XEmacs/no-Mule). Thus, you
should enable font-lock
in buffers where you want to use X-Symbol
(it is by default). See section 2.6.2 Syntax Highlighting Packages (font-lock
and add-ons).
When X-Symbol mode is turned on, it automatically adds the necessary
font-lock keywords to the buffer-local value of
font-lock-keywords
and all font-lock keywords which are commonly
used with the current token language.
Setting all font-lock keywords is important since font-lock
might
not yet been turned on or since you might want to change
font-lock
s decoration of the current buffer after X-Symbol has
been turned on.
Please note that switching the mode by typing M-x latex-mode
does not set the LaTeX's font-lock keywords! They are set at
the end of C-x C-f. If you switch the mode, turn on
font-lock
by yourself.
Independently from package X-Symbol, the following command might be useful in some situations:
3.6 Character Group and Token Classes
Each X-Symbol character belongs to a character group, e.g.,
natnums
belongs to setsymbol
. A character group should
consists of similar characters where "similar" means similar meaning,
not similar appearance. Two characters which have nearly the same
appearance, should be in the same group, though. The group determines:
-
The Grid and submenu header under which the character can be found
(see section 4.5 Input Method Grid: Choose Highlighted Character, 4.4 Input Method Menu: Select a Menu Item).
-
The default bindings of characters (see section 4.6 Input Method Keyboard: Compose Key Sequence) of
some groups.
-
Whether to show the context info for a character (see section 5.3 Info in Echo Area).
-
The default ASCII representation of a character (see section 5.4 Ascii Representation of Strings).
- When using Emacs/XEmacs with Mule support, the syntax of a character (see section `Syntax' in XEmacs User's Manual).
The character group is independent from any token language, but is probably somewhat related to some of its token classes. For each token language, each character is assigned to a list of token classes, which can be used for the following:
-
Information in the echo area (see section 5.3 Info in Echo Area), it could inform users to
include a specific LaTeX package when they want to use that character
in the document.
-
Using a coloring scheme when displaying the characters in the echo
area (see section 5.3 Info in Echo Area) or the Grid of characters (see section 4.5 Input Method Grid: Choose Highlighted Character), useful for characters which can just be used in a specific
context, like TeX's math-mode characters.
- Restricting the "electricity" of input method Electric (see section 4.8 Input Method Electric: Automatic Context), useful to disable this input methods for TeX's math-mode characters if we are in text-mode.
The token classes for individual token languages are explained in the corresponding sections of 6. Supported Token Languages:
x-symbol-lang-header-groups-alist
- The Grid and Menu headers for each token language lang.
x-symbol-lang-class-alist
- Strings for the character info in the echo area for each token language lang.
x-symbol-lang-class-face-alist
- The coloring scheme for each token language lang.
This document was generated by Christoph Wedler on December, 8 2003 using texi2html