Sunday, December 23, 2007

Adding extended character support

If you need to type a diacritical mark such as an acute "e" (é) -- let alone a character not found in a Western European language -- the standard English keyboard layouts for GNU/Linux users are barely ahead of those of typewriters. However, adding support for both extended characters and multiple keyboards has become much easier in the last few years. These days, you can quickly add extended character support from both GNOME and KDE, and, should either desktop fail you for any reason, you can fall back on other methods to improve your input.

Before you add the support you need, you should know the jargon of extended character support so you can navigate the available options. Extended character support depends on several different types of keys:


Deadkeys: Keys that do not print anything by themselves, but modify the next key that you press, usually adding a diacritical mark to them. For example, in many layouts, pressing the grave/tilde key (`/~ ) then a second key produces such characters as à or ã.
Compose, AltGraph, and multi_keys: Keys that change what is typed when you press the next key, just as the Shift key ordinarily prints upper-case letters instead of lower-. Usually, the AltGraph key is the right Alt key, while the Compose key is the right Windows or Menu key. The term "multi_key" is often used for the Compose Key in the X Window System. Generally only one of these keys is needed, although you could configure your system to have all three. The characters you can type with these keys are defined in /usr/X11R6/lib/X11/locale./locale name/Compose, but only some of the listings in the file will work on your particular system, so be prepared to experiment.
Third level chooser: In a standard English layout, all keys have a character they print normally, and one they print when the Shift key is held down. The third level chooser is the key you press to print a third character with a key. For instance, a layout might require you to press the third level chooser followed by the "y" key to type the sign for the yen (¥), or the "l" key for the British pound (£).
Any of these keys may be defined by the layout you select, or by your configuration choices. In both GNOME and KDE, the interface assumes you know what they mean.

GNOME and KDE support
The latest versions of GNOME and KDE have similar support for extended characters. In GNOME, keyboard options are available from the System -> Administration or System -> Preferences menu, the exact location depending on your distribution. In KDE, go to Regional and Accessibility -> Keyboard layout.

In either desktop go to the Layout/Layouts tab to enable multiple keyboard support for any of dozens of languages (the exact number depends on the distribution). Needless to say, you need a font capable of being used for the language you want. Changing keyboards requires that you log out and in again before the new choice takes effect.

If you are an English speaker and your main interest is being able to use the diacritical marks common to western European languages, you might want to select your default layout and look at the variants. For instance, in addition to Dvorak and other keyboard layouts, the variants for US English include three international variants: one with AltGraph and deadkeys, another with only deadkeys, and a third simply listed as Alternative International.

Unfortunately, none of these options may be useful. On my Fedora 8 installation, I could not get the AltGraph and deadkeys variant or the Alternative International options to work. The option with deadkeys worked, but had the disadvantage of requiring that you press the space bar if you want a plain apostrophe instead of an acute accent -- a change that can be remarkably hard to learn if you are a touch-typist.

Instead, I recommend you keep your existing layout and configure it from the Layout Options tab in GNOME or the Xkb Options tab in KDE. Besides options to change the behavior of the Caps Lock Key and numeric keypad, and for repositioning the Ctrl key, the Layout Option tab also contains several options that are useful for extended character support. One sets the key for switching between multiple layouts. Another defines the third level chooser, and can be used with another option to add support for typing the euro sign to one of several keys.

Most importantly, from either of these tabs, you can define your Compose key. This option works instantly without logging out. In my experience, it is the most useful option for extended character support.

The .Xmodmap solution
Until a couple of years ago, when extended character support started being built in to desktops, using .Xmodmap was the easiest way to add extended character support. It is still useful in cases where the GNOME and KDE settings all fail, as they did for me in Debian Lenny, or when you are using another graphical interface.

This solution requires a UTF-8 locale loaded on your system or desktop, and specified in /etc/X11/xorg.conf under the InputDevice section for your keyboard with a line like:

Option "XkbLayout" "en_US.utf8"Then you need to define the multi_key. Open a command line, enter the xev command, then press the key you are configuring as a multi_key. Note its keycode in the command line; the keycode will vary with the keyboard.

Armed with the keycode, open a text editor and type the following line:

keycode = Multi_keySave the file under the name of .Xmodmap in your home directory, then either restart the X Window System or reboot to enable extended character support.

If it is not enabled after a reboot, check that you have the correct keyboard layout and a UTF-8 language defined in /etc/X11/xorg.conf. You should also remove the "nodeadkeys" option. You may need to define the multi_key in xorg.conf. For instance, to use the left Windows key, add the line:

Option "XkbOptions" "compose:lwin,grp:switch"If you are using another key, all you have to change is the part that defines the key, replacing "lwin" with "rwin" for the right Windows key, or "ralt" for the right Alt key.

Advanced solutions
If none of these solutions gives you the character support you need, you have at least two alternatives. The classic support for multiple keyboards comes from SCIM (Smart Common Input Method). It is available in the repositories of many distributions, along with a selection of keyboards to use with it. SCIM is especially strong for Asian languages such as Chinese, Japanese, Korean, and Thai. Its main disadvantage is its involved setup, which is not helped by the fact that much of the project documentation is several years out of date -- although a guide for using SCIM with the latest Ubuntu release is available online.

A similar solution is KMFL (Keyboard Mapping for Linux), which Linux.com reviewed a couple of years ago. Based on SCIM, KMFL has an equally arduous setup. However, since it is written by the same company that produces the proprietary Keyman for Windows, KMFL may give access to a wider variety of layouts, especially if you need support for a lesser-known language.

At any rate, with all these solutions, extended character support for GNU/Linux is finally coming of age. Considering the imminent arrival of international Web addresses, it's about time.