4. Unicode symbols and special layout support
Proof General inherits support for displaying Unicode (and any other) fonts from the underlying Emacs program. If you are lucky, your system will be able to use or synthesise a font that provides a rich set of mathematical symbols. To store symbols directly in files you need to use a particular coding, for example UTF-8. Newer Emacs versions can handle a multitude of different coding systems and will try to automatically detect an appropriate one; consult the Emacs documentation for more details. Of course, the prover that you are using will need to understand the same encodings and symbol meanings.
Alternatively, you can use the Unicode Tokens mode provided in Proof General to display mathematical symbols in place of sequences of other characters (usually plain ASCII). This can provide better compatibility, portability, and flexibility. Even if you use real Unicode characters as prover input, the Unicode Tokens mode can provide some helpful facilities for input shorthands and giving special layout.
4.1 Maths menu
The Maths Menu minor mode (adapted from a menu by Dave Love) simply
adds a menu Maths
to the main menubar for inserting common
mathematical symbols. You can enable or disable it via the menu
Proof-General -> Quick Options -> Minor Modes -> Unicode Maths Menu |
(proof-maths-menu-toggle
). Whether or not the symbols display
well the menus depends on the font used to display the menus (which
depends on the Emacs version, toolkit and platform). Ordinarily, the
symbols inserted into the text will be Unicode characters which will be
saved in the file using the encoding selected by standard Emacs
mechanisms.
4.2 Unicode Tokens mode
The Unicode Tokens minor mode has been written specially for Proof General (with thanks to Stefan Monnier for providing inspiration and a starting point). It supports the display of symbols when the underlying text of the file and buffer actually contains something else, typically, plain ASCII text. It provides backward compatibility with the older X-Symbol mode.
Unicode Tokens can be enabled or disabled using the menu:
Proof-General -> Quick Options -> Display -> Unicode Tokens |
The mode to allows ASCII tokens (i.e., sequences of plain ASCII characters) to be displayed as Unicode character compositions, perhaps with additional text properties. The additional text properties allow the use of tokens to cause font changes (bold, italic), text size changes, and sub-script/super-script.
For example, the ASCII sequences /\
or \<And>
could be
displayed as a conjunction symbol.
The sequence x __ y
might be written to display y
as subscript. This allows a file to be stored in perfectly portable plain ASCII
encoding, but be displayed and edited with real symbols and
appealing layout. Of course, the proof assistant needs to understand
the underlying tokens in each case.
Technically, the mechanism is based on Emacs Font Lock facility, using
the composition
text property to display ASCII character sequence
tokens as something else. This means that the underlying buffer text is
not altered. This is a major advantage over the older X-Symbol (and
the experimental version of Unicode Tokens in PG 3.7.1), which had the
annoying risk of saving your buffer text in a corrupted format. This
can never happen with the new mode.
When the Unicode Tokens mode is enabled, Maths Menu is automatically modified to insert tokenised versions of the Unicode characters (whenever a reverse mapping can be found). This means that you can still use the Maths Menu to conveniently input symbols. You can easily add custom key bindings for particular symbols you need to enter often (see section Adding your own keybindings for examples).
The Unicode Tokens mode also allows short-cut sequences of ordinary characters to quickly type tokens (similarly to the facility provided by X-Symbol). These, along with the token settings themselves, are configured on a per-prover basis.
4.3 Configuring tokens symbols and shortcuts
To edit the strings used to display tokens, or the collection of
short-cuts, you can edit the
file PA-unicode-tokens.el
, or customize the main
variables it contains, for example PA-token-name-alist
and
PA-shortcut-alist
.
E.g., for Isabelle
M-x customize-variable isar-token-name-alist RET |
provides an interface to the tokens, and
M-x customize-variable isar-shortcut-alist |
an interface to the shortcuts.
Where possible, it is better to use the more fine grained way is available to do this, which edits the separate tables which are combine to form the big list of tokens. This is available via the menus, e.g., in Isabelle, use
Tokens -> Customize -> Extended Symbols |
to customize the symbols used for the “extended” (non standard) symbol list.
4.4 Special layout
The Unicode Tokens mode supports both symbol tokens used to display
character sequences in different ways and control tokens used to
control the layout of the text in various ways, such as superscript,
subscript, large, small, bold, italic, etc. (The size and position
layout is managed using Emacs’s display
text property)
As well as displaying token sequences as special symbols, symbol tokens
themselves can define layout options as well; for example you might
define a token \<hugeOplus>
to display a large circled-plus
glyph. If you try the customization mentioned in the section above you
will see the options available when defining symbols.
These options are fixed layout schemes which also make layout tokens
easy to configure for provers. The layout possibilities include the
ones shown in the table below. There are two ways of configuring
control tokens for layout: character controls and region
controls. The character controls apply to the next “character”,
although this is a prover-specific notion and might actually mean the
next word or identifier. An example might be writing BOLDCHAR x
to make a bold x. Similarly the region controls apply to a
delineated region of text, for example, writing BEGINBOLD this is
bold ENDBOLD
could cause the enclosed text this is bold to be
displayed in a bold font.
The control tokens that have been configured populate the Tokens menu, so, for example, you may be able to select a region of text and then use the menu item:
Tokens -> Format Region -> Bold |
to cause the bold region tokens to be inserted around the selected text, which should cause the buffer presentation to show the text in a bold format (hiding the tokens).
Here is the table of layout controls available. What you actually can use will depend on the configuration for the underlying prover.
-
sub
lower the text (subscript)
-
sup
raise the text (superscript)
-
bold
make the text be in the bold weight of the current font
-
italic
make the text be in the italic variant of the current font
-
big
make the text be in a bigger size of the current font
-
small
make the text be in a smaller size of the current font
-
underline
underline the text
-
overline
overline the text
-
script
display the text in a “script” font
-
frakt
display the text in a “fraktur” font
-
serif
display the text in a serif font
-
sans
display the text in a sans serif font
-
keyword
display the text in the keyword face (
font-lock-keyword-face
)-
function
display the text in the function name face (
font-lock-function-name-face
)-
type
display the text in the type name face (
font-lock-type-face
)-
preprocessor
display the text in the preprocessor face (
font-lock-preprocessor-face
)-
doc
display the text in the documentation face (
font-lock-doc-face
)-
builtin
display the text in the builtin face (
font-lock-builtin-face
)
Notice that the fonts can be set conveniently by the menu commands
Tokens -> Set Fonts -> Script |
etc. See section Selecting suitable fonts, for more.
The symbols used to select the various font-lock faces (see M-x
list-faces-display
to show them) allow you to define custom colouring
of text for proof assistant input and output, exploiting rich underlying
syntax mechanisms of the prover.
4.5 Moving between Unicode and tokens
If you want to share text between applications (e.g., email some text from an Isabelle theory file which heavily uses symbols), it is useful to convert to and from Unicode with cut-and-paste operations. The default buffer cut and paste functions will copy the underlying text, which contains the tokens (ASCII format). To copy and convert or paste then convert back, use these commands:
Tokens -> Copy as unicode Tokens -> Paste from unicode |
Both of these are necessarily approximate. The buffer presentation may use additional controls (for super/subscript layout or bold fonts, etc), which cannot be converted. Pasting relies on being able to identify a unique token mapped from a single Unicode character; the token table may not include such an entry, or may be ambiguous.
- Command: unicode-tokens-copy beg end
Copy presentation of region between beg and end.
This is an approximation; it makes assumptions about the behaviour of symbol compositions, and will lose layout information.
- Command: unicode-tokens-paste
Paste text from clipboard, converting Unicode to tokens where possible.
If you are using a mixture of “real” Unicode and tokens like this
you may want to be careful to check the buffer contents: the command
unicode-tokens-highlight-unicode
helps you to manage this.
It
is available on the Tokens menu as
Tokens -> Highlight Real Unicode Chars |
Alternative ways to check are to toggle the display of tokens using
Tokens -> Reveal Symbol Tokens |
(the similar entry for Control Tokens
displays tokens being used
to control layout). Or simply toggle the tokens mode, which will leave
the true Unicode tokens untouched.
4.6 Finding available tokens shortcuts and symbols
Two commands (both on the Tokens menu) allow you to see the tokens and shortcuts available:
Tokens -> List Tokens Tokens -> List Shortcuts |
Additionally, you can view the complete Unicode character set available in the default Emacs font, with
Tokens -> List Unicode Characters |
(this uses a list adapted from Norman Walsh’s unichars.el
).
Note that the Unicode Tokens modes displays symbols defined by symbol tokens in a special font.
- Command: unicode-tokens-list-unicode-chars
Insert each Unicode character into a buffer.
Lets you see which characters are available for literal display in your Emacs font.
4.7 Selecting suitable fonts
The precise set of symbol glyphs that are available to you will depend in complicated ways on your operating system, Emacs version, configuration options used when Emacs was compiled, installed font sets, and (even) command line options used to start Emacs. So it is hard to give comprehensive and accurate advice in this manual. In general, things work much better with Emacs 23 than earlier versions.
To improve flexibility, Unicode Tokens mode allows you to select another font to display symbols from the default font that is used to display text in the buffer. This is the font that is configured by the menu
Tokens -> Set Fonts -> Symbol |
its customization name is unicode-tokens-symbol-font-face
, but
notice that only the font family aspect of the face is used. Similarly,
other fonts can be configured for controlling different font families
(script, fraktur, etc).
For symbols, good results are possible by using a proportional font for
displaying symbols that has many symbol glyphs, for example the main font
StixGeneral font from the Stix Fonts project
(http://www.stixfonts.org/). At the time of writing you can
obtain a beta version of these fonts in TTF format from
http://olegueret.googlepages.com/stixfonts-ttf. On recent Linux
distributions and with an Emacs 23 build that uses Xft, simply copy
these ttf
files into the .fonts
directory inside your home
directory to make them available.
Another font I like is DejaVu Sans Mono. It covers all of the standard Isabelle symbols. Some of the symbols are currently not perfect; however this font is an open source effort so users can contribute or suggest improvements. See http://dejavu-fonts.org.
If you are stuck with Emacs 22, you need to use the fontset mechanism which configures sets of fonts to use for display. The default font sets may not include enough symbols (typical symptom: symbols display as empty boxes). To address this, the menu command
Tokens -> Set Fonts -> Make Fontsets |
constructs a number of fontsets at particular point sizes, based on
several widely available fonts. See pg-fontsets.el
for the code.
After running this command, you can select from additional fontsets from
the menus for changing fonts.
For further suggestions, please search (and contribute!) to the Proof General wiki at http://proofgeneral.inf.ed.ac.uk/wiki.
- Face: unicode-tokens-symbol-font-face
The default font used for symbols. Only :family and :slant attributes are used.
This document was generated on November 26, 2024 using texi2html 1.82.