4. Unicode symbols and special layout support

Proof General inherits support for displaying Unicode (and any other) fonts from the underlying Emacs program. If you are lucky, your system will be able to use or synthesise a font that provides a rich set of mathematical symbols. To store symbols directly in files you need to use a particular coding, for example UTF-8. Newer Emacs versions can handle a multitude of different coding systems and will try to automatically detect an appropriate one; consult the Emacs documentation for more details. Of course, the prover that you are using will need to understand the same encodings and symbol meanings.

Alternatively, you can use the Unicode Tokens mode provided in Proof General to display mathematical symbols in place of sequences of other characters (usually plain ASCII). This can provide better compatibility, portability, and flexibility. Even if you use real Unicode characters as prover input, the Unicode Tokens mode can provide some helpful facilities for input shorthands and giving special layout.

4.1 Maths menu

The Maths Menu minor mode (adapted from a menu by Dave Love) simply adds a menu Maths to the main menubar for inserting common mathematical symbols. You can enable or disable it via the menu

  Proof-General -> Quick Options -> Minor Modes -> Unicode Maths Menu

(proof-maths-menu-toggle). Whether or not the symbols display well the menus depends on the font used to display the menus (which depends on the Emacs version, toolkit and platform). Ordinarily, the symbols inserted into the text will be Unicode characters which will be saved in the file using the encoding selected by standard Emacs mechanisms.

4.2 Unicode Tokens mode

The Unicode Tokens minor mode has been written specially for Proof General (with thanks to Stefan Monnier for providing inspiration and a starting point). It supports the display of symbols when the underlying text of the file and buffer actually contains something else, typically, plain ASCII text. It provides backward compatibility with the older X-Symbol mode.

Unicode Tokens can be enabled or disabled using the menu:

  Proof-General -> Quick Options -> Display -> Unicode Tokens

The mode to allows ASCII tokens (i.e., sequences of plain ASCII characters) to be displayed as Unicode character compositions, perhaps with additional text properties. The additional text properties allow the use of tokens to cause font changes (bold, italic), text size changes, and sub-script/super-script.

For example, the ASCII sequences /\ or \<And> could be displayed as a conjunction symbol. The sequence x __ y might be written to display y as subscript. This allows a file to be stored in perfectly portable plain ASCII encoding, but be displayed and edited with real symbols and appealing layout. Of course, the proof assistant needs to understand the underlying tokens in each case.

Technically, the mechanism is based on Emacs Font Lock facility, using the composition text property to display ASCII character sequence tokens as something else. This means that the underlying buffer text is not altered. This is a major advantage over the older X-Symbol (and the experimental version of Unicode Tokens in PG 3.7.1), which had the annoying risk of saving your buffer text in a corrupted format. This can never happen with the new mode.

When the Unicode Tokens mode is enabled, Maths Menu is automatically modified to insert tokenised versions of the Unicode characters (whenever a reverse mapping can be found). This means that you can still use the Maths Menu to conveniently input symbols. You can easily add custom key bindings for particular symbols you need to enter often (see section Adding your own keybindings for examples).

The Unicode Tokens mode also allows short-cut sequences of ordinary characters to quickly type tokens (similarly to the facility provided by X-Symbol). These, along with the token settings themselves, are configured on a per-prover basis.

4.3 Configuring tokens symbols and shortcuts

To edit the strings used to display tokens, or the collection of short-cuts, you can edit the file PA-unicode-tokens.el, or customize the main variables it contains, for example PA-token-name-alist and PA-shortcut-alist.

E.g., for Isabelle

  M-x customize-variable isar-token-name-alist RET

provides an interface to the tokens, and

  M-x customize-variable isar-shortcut-alist

an interface to the shortcuts.

Where possible, it is better to use the more fine grained way is available to do this, which edits the separate tables which are combine to form the big list of tokens. This is available via the menus, e.g., in Isabelle, use

  Tokens -> Customize -> Extended Symbols

to customize the symbols used for the “extended” (non standard) symbol list.

4.4 Special layout

The Unicode Tokens mode supports both symbol tokens used to display character sequences in different ways and control tokens used to control the layout of the text in various ways, such as superscript, subscript, large, small, bold, italic, etc. (The size and position layout is managed using Emacs’s display text property)

As well as displaying token sequences as special symbols, symbol tokens themselves can define layout options as well; for example you might define a token \<hugeOplus> to display a large circled-plus glyph. If you try the customization mentioned in the section above you will see the options available when defining symbols.

These options are fixed layout schemes which also make layout tokens easy to configure for provers. The layout possibilities include the ones shown in the table below. There are two ways of configuring control tokens for layout: character controls and region controls. The character controls apply to the next “character”, although this is a prover-specific notion and might actually mean the next word or identifier. An example might be writing BOLDCHAR x to make a bold x. Similarly the region controls apply to a delineated region of text, for example, writing BEGINBOLD this is bold ENDBOLD could cause the enclosed text this is bold to be displayed in a bold font.

The control tokens that have been configured populate the Tokens menu, so, for example, you may be able to select a region of text and then use the menu item:

  Tokens -> Format Region -> Bold

to cause the bold region tokens to be inserted around the selected text, which should cause the buffer presentation to show the text in a bold format (hiding the tokens).

Here is the table of layout controls available. What you actually can use will depend on the configuration for the underlying prover.

sub: lower the text (subscript)
sup: raise the text (superscript)
bold: make the text be in the bold weight of the current font
italic: make the text be in the italic variant of the current font
big: make the text be in a bigger size of the current font
small: make the text be in a smaller size of the current font
underline: underline the text
overline: overline the text
script: display the text in a “script” font
frakt: display the text in a “fraktur” font
serif: display the text in a serif font
sans: display the text in a sans serif font
keyword: display the text in the keyword face (font-lock-keyword-face)
function: display the text in the function name face (font-lock-function-name-face)
type: display the text in the type name face (font-lock-type-face)
preprocessor: display the text in the preprocessor face (font-lock-preprocessor-face)
doc: display the text in the documentation face (font-lock-doc-face)
builtin: display the text in the builtin face (font-lock-builtin-face)

Notice that the fonts can be set conveniently by the menu commands

        Tokens -> Set Fonts -> Script

etc. See section Selecting suitable fonts, for more.

The symbols used to select the various font-lock faces (see M-x list-faces-display to show them) allow you to define custom colouring of text for proof assistant input and output, exploiting rich underlying syntax mechanisms of the prover.

Face: unicode-tokens-serif-font-face: Serif (roman) font face.

Face: unicode-tokens-sans-font-face: Sans serif font face.

Face: unicode-tokens-fraktur-font-face: Fraktur font face.

Face: unicode-tokens-script-font-face: Script font face.

4.5 Moving between Unicode and tokens

If you want to share text between applications (e.g., email some text from an Isabelle theory file which heavily uses symbols), it is useful to convert to and from Unicode with cut-and-paste operations. The default buffer cut and paste functions will copy the underlying text, which contains the tokens (ASCII format). To copy and convert or paste then convert back, use these commands:

  Tokens -> Copy as unicode
  Tokens -> Paste from unicode

Both of these are necessarily approximate. The buffer presentation may use additional controls (for super/subscript layout or bold fonts, etc), which cannot be converted. Pasting relies on being able to identify a unique token mapped from a single Unicode character; the token table may not include such an entry, or may be ambiguous.

Command: unicode-tokens-copy beg end: Copy presentation of region between beg and end.
This is an approximation; it makes assumptions about the behaviour of symbol compositions, and will lose layout information.

Command: unicode-tokens-paste: Paste text from clipboard, converting Unicode to tokens where possible.

If you are using a mixture of “real” Unicode and tokens like this you may want to be careful to check the buffer contents: the command unicode-tokens-highlight-unicode helps you to manage this. It is available on the Tokens menu as

  Tokens -> Highlight Real Unicode Chars

Alternative ways to check are to toggle the display of tokens using

  Tokens -> Reveal Symbol Tokens

(the similar entry for Control Tokens displays tokens being used to control layout). Or simply toggle the tokens mode, which will leave the true Unicode tokens untouched.

Variable: unicode-tokens-highlight-unicode: Non-nil to highlight Unicode characters.

4.6 Finding available tokens shortcuts and symbols

Two commands (both on the Tokens menu) allow you to see the tokens and shortcuts available:

        Tokens -> List Tokens
        Tokens -> List Shortcuts

Additionally, you can view the complete Unicode character set available in the default Emacs font, with

        Tokens -> List Unicode Characters

(this uses a list adapted from Norman Walsh’s unichars.el).

Note that the Unicode Tokens modes displays symbols defined by symbol tokens in a special font.

Command: unicode-tokens-list-tokens: Show a buffer of all tokens.

Command: unicode-tokens-list-shortcuts: Show a buffer of all the shortcuts available.

Command: unicode-tokens-list-unicode-chars: Insert each Unicode character into a buffer.
Lets you see which characters are available for literal display in your Emacs font.

4.7 Selecting suitable fonts

The precise set of symbol glyphs that are available to you will depend in complicated ways on your operating system, Emacs version, configuration options used when Emacs was compiled, installed font sets, and (even) command line options used to start Emacs. So it is hard to give comprehensive and accurate advice in this manual. In general, things work much better with Emacs 23 than earlier versions.

To improve flexibility, Unicode Tokens mode allows you to select another font to display symbols from the default font that is used to display text in the buffer. This is the font that is configured by the menu

        Tokens -> Set Fonts -> Symbol

its customization name is unicode-tokens-symbol-font-face, but notice that only the font family aspect of the face is used. Similarly, other fonts can be configured for controlling different font families (script, fraktur, etc).

For symbols, good results are possible by using a proportional font for displaying symbols that has many symbol glyphs, for example the main font StixGeneral font from the Stix Fonts project (http://www.stixfonts.org/). At the time of writing you can obtain a beta version of these fonts in TTF format from http://olegueret.googlepages.com/stixfonts-ttf. On recent Linux distributions and with an Emacs 23 build that uses Xft, simply copy these ttf files into the .fonts directory inside your home directory to make them available.

Another font I like is DejaVu Sans Mono. It covers all of the standard Isabelle symbols. Some of the symbols are currently not perfect; however this font is an open source effort so users can contribute or suggest improvements. See http://dejavu-fonts.org.

If you are stuck with Emacs 22, you need to use the fontset mechanism which configures sets of fonts to use for display. The default font sets may not include enough symbols (typical symptom: symbols display as empty boxes). To address this, the menu command

        Tokens -> Set Fonts -> Make Fontsets

constructs a number of fontsets at particular point sizes, based on several widely available fonts. See pg-fontsets.el for the code. After running this command, you can select from additional fontsets from the menus for changing fonts.

For further suggestions, please search (and contribute!) to the Proof General wiki at http://proofgeneral.inf.ed.ac.uk/wiki.

Face: unicode-tokens-symbol-font-face: The default font used for symbols. Only :family and :slant attributes are used.

Variable: unicode-tokens-font-family-alternatives: Not documented.

This document was generated on July 15, 2025 using texi2html 1.82.