[Top] [Contents] [Index] [ ? ]

GNU gettext utilities

1. Introduction  
2. PO Files and PO Mode Basics  
3. Preparing Program Sources  
4. Making the PO Template File  
5. Creating a New PO File  
6. Updating Existing PO Files  
7. Manipulating PO Files  
8. Producing Binary MO Files  
9. The User's View  
10. The Programmer's View  
11. The Translator's View  
12. The Maintainer's View  
13. Other Programming Languages  
14. Concluding Remarks  

A. Language Codes  ISO 639 language codes
B. Country Codes  ISO 3166 country codes

Program Index  Index of Programs
Option Index  Index of Command-Line Options
Variable Index  Index of Environment Variables
PO Mode Index  Index of Emacs PO Mode Commands
Autoconf Macro Index  Index of Autoconf Macros
General Index  

 -- The Detailed Node Listing ---

Introduction

1.1 The Purpose of GNU gettext  
1.2 I18n, L10n, and Such  
1.3 Aspects in Native Language Support  
1.4 Files Conveying Translations  
1.5 Overview of GNU gettext  

PO Files and PO Mode Basics

2.1 Completing GNU gettext Installation  
2.2 The Format of PO Files  
2.3 Main PO mode Commands  Main Commands
2.4 Entry Positioning  
2.5 Normalizing Strings in Entries  

Preparing Program Sources

3.1 Triggering gettext Operations  
3.2 Preparing Translatable Strings  
3.3 How Marks Appear in Sources  
3.4 Marking Translatable Strings  
3.5 Special Comments preceding Keywords  Telling something about the following string
3.6 Special Cases of Translatable Strings  

Making the PO Template File

4.1 Invoking the xgettext Program  

Creating a New PO File

5.1 Invoking the msginit Program  
5.2 Filling in the Header Entry  

Updating Existing PO Files

6.1 Invoking the msgmerge Program  
6.2 Translated Entries  
6.3 Fuzzy Entries  
6.4 Untranslated Entries  
6.5 Obsolete Entries  
6.6 Modifying Translations  
6.7 Modifying Comments  
6.8 Details of Sub Edition  Mode for Editing Translations
6.9 C Sources Context  
6.10 Consulting Auxiliary PO Files  
6.11 Using Translation Compendia  

Using Translation Compendia

6.11.1 Creating Compendia  Merging translations for later use
6.11.2 Using Compendia  Using older translations if they fit

Manipulating PO Files

7.1 Invoking the msgcat Program  
7.2 Invoking the msgconv Program  
7.3 Invoking the msggrep Program  
7.4 Invoking the msgfilter Program  
7.5 Invoking the msguniq Program  
7.6 Invoking the msgcomm Program  
7.7 Invoking the msgcmp Program  
7.8 Invoking the msgattrib Program  
7.9 Invoking the msgen Program  
7.10 Invoking the msgexec Program  

Producing Binary MO Files

8.1 Invoking the msgfmt Program  
8.2 Invoking the msgunfmt Program  
8.3 The Format of GNU MO Files  

The User's View

9.1 The Current `ABOUT-NLS' Matrix  
9.2 Magic for Installers  
9.3 Magic for End Users  

The Programmer's View

10.1 About catgets  
10.2 About gettext  
10.3 Comparing the Two Interfaces  Comparing the two interfaces
10.4 Using libintl.a in own programs  
10.5 Being a gettext grok  
10.6 Temporary Notes for the Programmers Chapter  

About catgets

10.1.1 The Interface  The interface
10.1.2 Problems with the catgets Interface?!  Problems with the catgets interface?!

About gettext

10.2.1 The Interface  The interface
10.2.2 Solving Ambiguities  Solving ambiguities
10.2.3 Locating Message Catalog Files  Locating message catalog files
10.2.4 How to specify the output character set gettext uses  How to request conversion to Unicode
10.2.5 Additional functions for plural forms  Additional functions for handling plurals
10.2.6 How to use gettext in GUI programs  Another technique for solving ambiguities
10.2.7 Optimization of the *gettext functions  

Temporary Notes for the Programmers Chapter

10.6.1 Temporary - Two Possible Implementations  
10.6.2 Temporary - About catgets  
10.6.3 Temporary - Why a single implementation  
10.6.4 Temporary - Notes  

The Translator's View

11.1 Introduction 0  
11.2 Introduction 1  
11.3 Discussions  
11.4 Organization  
11.5 Information Flow  

Organization

11.4.1 Central Coordination  
11.4.2 National Teams  
11.4.3 Mailing Lists  

National Teams

11.4.2.1 Sub-Cultures  
11.4.2.2 Organizational Ideas  

The Maintainer's View

12.1 Flat or Non-Flat Directory Structures  
12.2 Prerequisite Works  
12.3 Invoking the gettextize Program  
12.4 Files You Must Create or Alter  
12.5 Autoconf macros for use in `configure.in'  

Files You Must Create or Alter

12.4.1 `POTFILES.in' in `po/'  
12.4.2 `LINGUAS' in `po/'  
12.4.3 `Makefile' pieces in `po/'  
12.4.4 `configure.in' at top level  
12.4.5 `config.guess', `config.sub' at top level  
12.4.6 `aclocal.m4' at top level  
12.4.7 `acconfig.h' at top level  
12.4.8 `Makefile.in' at top level  
12.4.9 `Makefile.in' in `src/'  
12.4.10 `gettext.h' in `lib/'  

Autoconf macros for use in `configure.in'

12.5.1 AM_GNU_GETTEXT in `gettext.m4'  
12.5.2 AM_ICONV in `iconv.m4'  

Other Programming Languages

13.1 The Language Implementor's View  
13.2 The Programmer's View  
13.3 The Translator's View  
13.4 The Maintainer's View  
13.5 Individual Programming Languages  
13.6 Internationalizable Data  

Individual Programming Languages

13.5.1 C, C++, Objective C  
13.5.2 sh - Shell Script  
13.5.3 bash - Bourne-Again Shell Script  
13.5.4 Python  
13.5.5 GNU clisp - Common Lisp  
13.5.6 GNU clisp C sources  
13.5.7 Emacs Lisp  
13.5.8 librep  
13.5.9 GNU Smalltalk  
13.5.10 Java  
13.5.11 GNU awk  
13.5.12 Pascal - Free Pascal Compiler  
13.5.13 wxWindows library  
13.5.14 YCP - YaST2 scripting language  
13.5.15 Tcl - Tk's scripting language  
13.5.16 Perl  
13.5.17 PHP Hypertext Preprocessor  
13.5.18 Pike  

Internationalizable Data

13.6.1 POT - Portable Object Template  
13.6.2 Resource String Table  
13.6.3 Glade - GNOME user interface description  

Concluding Remarks

14.1 History of GNU gettext  
14.2 Related Readings  


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1. Introduction

This manual is still in DRAFT state. Some sections are still empty, or almost. We keep merging material from other sources (essentially e-mail folders) while the proper integration of this material is delayed.

In this manual, we use he when speaking of the programmer or maintainer, she when speaking of the translator, and they when speaking of the installers or end users of the translated program. This is only a convenience for clarifying the documentation. It is absolutely not meant to imply that some roles are more appropriate to males or females. Besides, as you might guess, GNU gettext is meant to be useful for people using computers, whatever their sex, race, religion or nationality!

This chapter explains the goals sought in the creation of GNU gettext and the free Translation Project. Then, it explains a few broad concepts around Native Language Support, and positions message translation with regard to other aspects of national and cultural variance, as they apply to to programs. It also surveys those files used to convey the translations. It explains how the various tools interact in the initial generation of these files, and later, how the maintenance cycle should usually operate.

Please send suggestions and corrections to:

 
Internet address:
    bug-gnu-gettext@gnu.org

Please include the manual's edition number and update date in your messages.

1.1 The Purpose of GNU gettext  
1.2 I18n, L10n, and Such  
1.3 Aspects in Native Language Support  
1.4 Files Conveying Translations  
1.5 Overview of GNU gettext  


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.1 The Purpose of GNU gettext

Usually, programs are written and documented in English, and use English at execution time to interact with users. This is true not only of GNU software, but also of a great deal of commercial and free software. Using a common language is quite handy for communication between developers, maintainers and users from all countries. On the other hand, most people are less comfortable with English than with their own native language, and would prefer to use their mother tongue for day to day's work, as far as possible. Many would simply love to see their computer screen showing a lot less of English, and far more of their own language.

However, to many people, this dream might appear so far fetched that they may believe it is not even worth spending time thinking about it. They have no confidence at all that the dream might ever become true. Yet some have not lost hope, and have organized themselves. The Translation Project is a formalization of this hope into a workable structure, which has a good chance to get all of us nearer the achievement of a truly multi-lingual set of programs.

GNU gettext is an important step for the Translation Project, as it is an asset on which we may build many other steps. This package offers to programmers, translators and even users, a well integrated set of tools and documentation. Specifically, the GNU gettext utilities are a set of tools that provides a framework within which other free packages may produce multi-lingual messages. These tools include

GNU gettext is designed to minimize the impact of internationalization on program sources, keeping this impact as small and hardly noticeable as possible. Internationalization has better chances of succeeding if it is very light weighted, or at least, appear to be so, when looking at program sources.

The Translation Project also uses the GNU gettext distribution as a vehicle for documenting its structure and methods. This goes beyond the strict technicalities of documenting the GNU gettext proper. By so doing, translators will find in a single place, as far as possible, all they need to know for properly doing their translating work. Also, this supplemental documentation might also help programmers, and even curious users, in understanding how GNU gettext is related to the remainder of the Translation Project, and consequently, have a glimpse at the big picture.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.2 I18n, L10n, and Such

Two long words appear all the time when we discuss support of native language in programs, and these words have a precise meaning, worth being explained here, once and for all in this document. The words are internationalization and localization. Many people, tired of writing these long words over and over again, took the habit of writing i18n and l10n instead, quoting the first and last letter of each word, and replacing the run of intermediate letters by a number merely telling how many such letters there are. But in this manual, in the sake of clarity, we will patiently write the names in full, each time...

By internationalization, one refers to the operation by which a program, or a set of programs turned into a package, is made aware of and able to support multiple languages. This is a generalization process, by which the programs are untied from calling only English strings or other English specific habits, and connected to generic ways of doing the same, instead. Program developers may use various techniques to internationalize their programs. Some of these have been standardized. GNU gettext offers one of these standards. See section 10. The Programmer's View.

By localization, one means the operation by which, in a set of programs already internationalized, one gives the program all needed information so that it can adapt itself to handle its input and output in a fashion which is correct for some native language and cultural habits. This is a particularisation process, by which generic methods already implemented in an internationalized program are used in specific ways. The programming environment puts several functions to the programmers disposal which allow this runtime configuration. The formal description of specific set of cultural habits for some country, together with all associated translations targeted to the same native language, is called the locale for this language or country. Users achieve localization of programs by setting proper values to special environment variables, prior to executing those programs, identifying which locale should be used.

In fact, locale message support is only one component of the cultural data that makes up a particular locale. There are a whole host of routines and functions provided to aid programmers in developing internationalized software and which allow them to access the data stored in a particular locale. When someone presently refers to a particular locale, they are obviously referring to the data stored within that particular locale. Similarly, if a programmer is referring to "accessing the locale routines", they are referring to the complete suite of routines that access all of the locale's information.

One uses the expression Native Language Support, or merely NLS, for speaking of the overall activity or feature encompassing both internationalization and localization, allowing for multi-lingual interactions in a program. In a nutshell, one could say that internationalization is the operation by which further localizations are made possible.

Also, very roughly said, when it comes to multi-lingual messages, internationalization is usually taken care of by programmers, and localization is usually taken care of by translators.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.3 Aspects in Native Language Support

For a totally multi-lingual distribution, there are many things to translate beyond output messages.

As we already stressed, translation is only one aspect of locales. Other internationalization aspects are system services and are handled in GNU libc. There are many attributes that are needed to define a country's cultural conventions. These attributes include beside the country's native language, the formatting of the date and time, the representation of numbers, the symbols for currency, etc. These local rules are termed the country's locale. The locale represents the knowledge needed to support the country's native attributes.

There are a few major areas which may vary between countries and hence, define what a locale must describe. The following list helps putting multi-lingual messages into the proper context of other tasks related to locales. See the GNU libc manual for details.

Characters and Codesets

The codeset most commonly used through out the USA and most English speaking parts of the world is the ASCII codeset. However, there are many characters needed by various locales that are not found within this codeset. The 8-bit ISO 8859-1 code set has most of the special characters needed to handle the major European languages. However, in many cases, the ISO 8859-1 font is not adequate: it doesn't even handle the major European currency. Hence each locale will need to specify which codeset they need to use and will need to have the appropriate character handling routines to cope with the codeset.

Currency

The symbols used vary from country to country as does the position used by the symbol. Software needs to be able to transparently display currency figures in the native mode for each locale.

Dates

The format of date varies between locales. For example, Christmas day in 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia. Other countries might use ISO 8061 dates, etc.

Time of the day may be noted as hh:mm, hh.mm, or otherwise. Some locales require time to be specified in 24-hour mode rather than as AM or PM. Further, the nature and yearly extent of the Daylight Saving correction vary widely between countries.

Numbers

Numbers can be represented differently in different locales. For example, the following numbers are all written correctly for their respective locales:

 
12,345.67       English
12.345,67       German
 12345,67       French
1,2345.67       Asia

Some programs could go further and use different unit systems, like English units or Metric units, or even take into account variants about how numbers are spelled in full.

Messages

The most obvious area is the language support within a locale. This is where GNU gettext provides the means for developers and users to easily change the language that the software uses to communicate to the user.

Components of locale outside of message handling are standardized in the ISO C standard and the SUSV2 specification. GNU libc fully implements this, and most other modern systems provide a more or less reasonable support for at least some of the missing components.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.4 Files Conveying Translations

The letters PO in `.po' files means Portable Object, to distinguish it from `.mo' files, where MO stands for Machine Object. This paradigm, as well as the PO file format, is inspired by the NLS standard developed by Uniforum, and first implemented by Sun in their Solaris system.

PO files are meant to be read and edited by humans, and associate each original, translatable string of a given package with its translation in a particular target language. A single PO file is dedicated to a single target language. If a package supports many languages, there is one such PO file per language supported, and each package has its own set of PO files. These PO files are best created by the xgettext program, and later updated or refreshed through the msgmerge program. Program xgettext extracts all marked messages from a set of C files and initializes a PO file with empty translations. Program msgmerge takes care of adjusting PO files between releases of the corresponding sources, commenting obsolete entries, initializing new ones, and updating all source line references. Files ending with `.pot' are kind of base translation files found in distributions, in PO file format.

MO files are meant to be read by programs, and are binary in nature. A few systems already offer tools for creating and handling MO files as part of the Native Language Support coming with the system, but the format of these MO files is often different from system to system, and non-portable. The tools already provided with these systems don't support all the features of GNU gettext. Therefore GNU gettext uses its own format for MO files. Files ending with `.gmo' are really MO files, when it is known that these files use the GNU format.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.5 Overview of GNU gettext

The following diagram summarizes the relation between the files handled by GNU gettext and the tools acting on these files. It is followed by somewhat detailed explanations, which you should read while keeping an eye on the diagram. Having a clear understanding of these interrelations will surely help programmers, translators and maintainers.

 
Original C Sources ---> PO mode ---> Marked C Sources ---.
                                                         |
              .---------<--- GNU gettext Library         |
.--- make <---+                                          |
|             `---------<--------------------+-----------'
|                                            |
|   .-----<--- PACKAGE.pot <--- xgettext <---'   .---<--- PO Compendium
|   |                                            |             ^
|   |                                            `---.         |
|   `---.                                            +---> PO mode ---.
|       +----> msgmerge ------> LANG.po ---->--------'                |
|   .---'                                                             |
|   |                                                                 |
|   `-------------<---------------.                                   |
|                                 +--- New LANG.po <------------------'
|   .--- LANG.gmo <--- msgfmt <---'
|   |
|   `---> install ---> /.../LANG/PACKAGE.mo ---.
|                                              +---> "Hello world!"
`-------> install ---> /.../bin/PROGRAM -------'

The indication `PO mode' appears in two places in this picture, and you may safely read it as merely meaning "hand editing", using any editor of your choice, really. However, for those of you being the lucky users of Emacs, PO mode has been specifically created for providing a cozy environment for editing or modifying PO files. While editing a PO file, PO mode allows for the easy browsing of auxiliary and compendium PO files, as well as for following references into the set of C program sources from which PO files have been derived. It has a few special features, among which are the interactive marking of program strings as translatable, and the validatation of PO files with easy repositioning to PO file lines showing errors.

As a programmer, the first step to bringing GNU gettext into your package is identifying, right in the C sources, those strings which are meant to be translatable, and those which are untranslatable. This tedious job can be done a little more comfortably using emacs PO mode, but you can use any means familiar to you for modifying your C sources. Beside this some other simple, standard changes are needed to properly initialize the translation library. See section 3. Preparing Program Sources, for more information about all this.

For newly written software the strings of course can and should be marked while writing it. The gettext approach makes this very easy. Simply put the following lines at the beginning of each file or in a central header file:

 
#define _(String) (String)
#define N_(String) String
#define textdomain(Domain)
#define bindtextdomain(Package, Directory)

Doing this allows you to prepare the sources for internationalization. Later when you feel ready for the step to use the gettext library simply replace these definitions by the following:

 
#include <libintl.h>
#define _(String) gettext (String)
#define gettext_noop(String) String
#define N_(String) gettext_noop (String)

and link against `libintl.a' or `libintl.so'. Note that on GNU systems, you don't need to link with libintl because the gettext library functions are already contained in GNU libc. That is all you have to change.

Once the C sources have been modified, the xgettext program is used to find and extract all translatable strings, and create a PO template file out of all these. This `package.pot' file contains all original program strings. It has sets of pointers to exactly where in C sources each string is used. All translations are set to empty. The letter t in `.pot' marks this as a Template PO file, not yet oriented towards any particular language. See section 4.1 Invoking the xgettext Program, for more details about how one calls the xgettext program. If you are really lazy, you might be interested at working a lot more right away, and preparing the whole distribution setup (see section 12. The Maintainer's View). By doing so, you spare yourself typing the xgettext command, as make should now generate the proper things automatically for you!

The first time through, there is no `lang.po' yet, so the msgmerge step may be skipped and replaced by a mere copy of `package.pot' to `lang.po', where lang represents the target language. See 5. Creating a New PO File for details.

Then comes the initial translation of messages. Translation in itself is a whole matter, still exclusively meant for humans, and whose complexity far overwhelms the level of this manual. Nevertheless, a few hints are given in some other chapter of this manual (see section 11. The Translator's View). You will also find there indications about how to contact translating teams, or becoming part of them, for sharing your translating concerns with others who target the same native language.

While adding the translated messages into the `lang.po' PO file, if you do not have Emacs handy, you are on your own for ensuring that your efforts fully respect the PO file format, and quoting conventions (see section 2.2 The Format of PO Files). This is surely not an impossible task, as this is the way many people have handled PO files already for Uniforum or Solaris. On the other hand, by using PO mode in Emacs, most details of PO file format are taken care of for you, but you have to acquire some familiarity with PO mode itself. Besides main PO mode commands (see section 2.3 Main PO mode Commands), you should know how to move between entries (see section 2.4 Entry Positioning), and how to handle untranslated entries (see section 6.4 Untranslated Entries).

If some common translations have already been saved into a compendium PO file, translators may use PO mode for initializing untranslated entries from the compendium, and also save selected translations into the compendium, updating it (see section 6.11 Using Translation Compendia). Compendium files are meant to be exchanged between members of a given translation team.

Programs, or packages of programs, are dynamic in nature: users write bug reports and suggestion for improvements, maintainers react by modifying programs in various ways. The fact that a package has already been internationalized should not make maintainers shy of adding new strings, or modifying strings already translated. They just do their job the best they can. For the Translation Project to work smoothly, it is important that maintainers do not carry translation concerns on their already loaded shoulders, and that translators be kept as free as possible of programming concerns.

The only concern maintainers should have is carefully marking new strings as translatable, when they should be, and do not otherwise worry about them being translated, as this will come in proper time. Consequently, when programs and their strings are adjusted in various ways by maintainers, and for matters usually unrelated to translation, xgettext would construct `package.pot' files which are evolving over time, so the translations carried by `lang.po' are slowly fading out of date.

It is important for translators (and even maintainers) to understand that package translation is a continuous process in the lifetime of a package, and not something which is done once and for all at the start. After an initial burst of translation activity for a given package, interventions are needed once in a while, because here and there, translated entries become obsolete, and new untranslated entries appear, needing translation.

The msgmerge program has the purpose of refreshing an already existing `lang.po' file, by comparing it with a newer `package.pot' template file, extracted by xgettext out of recent C sources. The refreshing operation adjusts all references to C source locations for strings, since these strings move as programs are modified. Also, msgmerge comments out as obsolete, in `lang.po', those already translated entries which are no longer used in the program sources (see section 6.5 Obsolete Entries). It finally discovers new strings and inserts them in the resulting PO file as untranslated entries (see section 6.4 Untranslated Entries). See section 6.1 Invoking the msgmerge Program, for more information about what msgmerge really does.

Whatever route or means taken, the goal is to obtain an updated `lang.po' file offering translations for all strings.

The temporal mobility, or fluidity of PO files, is an integral part of the translation game, and should be well understood, and accepted. People resisting it will have a hard time participating in the Translation Project, or will give a hard time to other participants! In particular, maintainers should relax and include all available official PO files in their distributions, even if these have not recently been updated, without exerting pressure on the translator teams to get the job done. The pressure should rather come from the community of users speaking a particular language, and maintainers should consider themselves fairly relieved of any concern about the adequacy of translation files. On the other hand, translators should reasonably try updating the PO files they are responsible for, while the package is undergoing pretest, prior to an official distribution.

Once the PO file is complete and dependable, the msgfmt program is used for turning the PO file into a machine-oriented format, which may yield efficient retrieval of translations by the programs of the package, whenever needed at runtime (see section 8.3 The Format of GNU MO Files). See section 8.1 Invoking the msgfmt Program, for more information about all modes of execution for the msgfmt program.

Finally, the modified and marked C sources are compiled and linked with the GNU gettext library, usually through the operation of make, given a suitable `Makefile' exists for the project, and the resulting executable is installed somewhere users will find it. The MO files themselves should also be properly installed. Given the appropriate environment variables are set (see section 9.3 Magic for End Users), the program should localize itself automatically, whenever it executes.

The remainder of this manual has the purpose of explaining in depth the various steps outlined above.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2. PO Files and PO Mode Basics

The GNU gettext toolset helps programmers and translators at producing, updating and using translation files, mainly those PO files which are textual, editable files. This chapter stresses the format of PO files, and contains a PO mode starter. PO mode description is spread throughout this manual instead of being concentrated in one place. Here we present only the basics of PO mode.

2.1 Completing GNU gettext Installation  
2.2 The Format of PO Files  
2.3 Main PO mode Commands  Main Commands
2.4 Entry Positioning  
2.5 Normalizing Strings in Entries  


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1 Completing GNU gettext Installation

Once you have received, unpacked, configured and compiled the GNU gettext distribution, the `make install' command puts in place the programs xgettext, msgfmt, gettext, and msgmerge, as well as their available message catalogs. To top off a comfortable installation, you might also want to make the PO mode available to your Emacs users.

During the installation of the PO mode, you might want to modify your file `.emacs', once and for all, so it contains a few lines looking like:

 
(setq auto-mode-alist
      (cons '("\\.po\\'\\|\\.po\\." . po-mode) auto-mode-alist))
(autoload 'po-mode "po-mode" "Major mode for translators to edit PO files" t)

Later, whenever you edit some `.po' file, or any file having the string `.po.' within its name, Emacs loads `po-mode.elc' (or `po-mode.el') as needed, and automatically activates PO mode commands for the associated buffer. The string PO appears in the mode line for any buffer for which PO mode is active. Many PO files may be active at once in a single Emacs session.

If you are using Emacs version 20 or newer, and have already installed the appropriate international fonts on your system, you may also tell Emacs how to determine automatically the coding system of every PO file. This will often (but not always) cause the necessary fonts to be loaded and used for displaying the translations on your Emacs screen. For this to happen, add the lines:

 
(modify-coding-system-alist 'file "\\.po\\'\\|\\.po\\."
                            'po-find-file-coding-system)
(autoload 'po-find-file-coding-system "po-mode")

to your `.emacs' file. If, with this, you still see boxes instead of international characters, try a different font set (via Shift Mouse button 1).


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.2 The Format of PO Files

A PO file is made up of many entries, each entry holding the relation between an original untranslated string and its corresponding translation. All entries in a given PO file usually pertain to a single project, and all translations are expressed in a single target language. One PO file entry has the following schematic structure:

 
white-space
#  translator-comments
#. automatic-comments
#: reference...
#, flag...
msgid untranslated-string
msgstr translated-string

The general structure of a PO file should be well understood by the translator. When using PO mode, very little has to be known about the format details, as PO mode takes care of them for her.

Entries begin with some optional white space. Usually, when generated through GNU gettext tools, there is exactly one blank line between entries. Then comments follow, on lines all starting with the character #. There are two kinds of comments: those which have some white space immediately following the #, which comments are created and maintained exclusively by the translator, and those which have some non-white character just after the #, which comments are created and maintained automatically by GNU gettext tools. All comments, of either kind, are optional.

After white space and comments, entries show two strings, namely first the untranslated string as it appears in the original program sources, and then, the translation of this string. The original string is introduced by the keyword msgid, and the translation, by msgstr. The two strings, untranslated and translated, are quoted in various ways in the PO file, using " delimiters and \ escapes, but the translator does not really have to pay attention to the precise quoting format, as PO mode fully takes care of quoting for her.

The msgid strings, as well as automatic comments, are produced and managed by other GNU gettext tools, and PO mode does not provide means for the translator to alter these. The most she can do is merely deleting them, and only by deleting the whole entry. On the other hand, the msgstr string, as well as translator comments, are really meant for the translator, and PO mode gives her the full control she needs.

The comment lines beginning with #, are special because they are not completely ignored by the programs as comments generally are. The comma separated list of flags is used by the msgfmt program to give the user some better diagnostic messages. Currently there are two forms of flags defined:

fuzzy
This flag can be generated by the msgmerge program or it can be inserted by the translator herself. It shows that the msgstr string might not be a correct translation (anymore). Only the translator can judge if the translation requires further modification, or is acceptable as is. Once satisfied with the translation, she then removes this fuzzy attribute. The msgmerge program inserts this when it combined the msgid and msgstr entries after fuzzy search only. See section 6.3 Fuzzy Entries.

c-format
no-c-format
These flags should not be added by a human. Instead only the xgettext program adds them. In an automated PO file processing system as proposed here the user changes would be thrown away again as soon as the xgettext program generates a new template file.

In case the c-format flag is given for a string the msgfmt does some more tests to check to validity of the translation. See section 8.1 Invoking the msgfmt Program.

A different kind of entries is used for translations which involve plural forms.

 
white-space
#  translator-comments
#. automatic-comments
#: reference...
#, flag...
msgid untranslated-string-singular
msgid_plural untranslated-string-plural
msgstr[0] translated-string-case-0
...
msgstr[N] translated-string-case-n

It happens that some lines, usually whitespace or comments, follow the very last entry of a PO file. Such lines are not part of any entry, and PO mode is unable to take action on those lines. By using the PO mode function M-x po-normalize, the translator may get rid of those spurious lines. See section 2.5 Normalizing Strings in Entries.

The remainder of this section may be safely skipped by those using PO mode, yet it may be interesting for everybody to have a better idea of the precise format of a PO file. On the other hand, those not having Emacs handy should carefully continue reading on.

Each of untranslated-string and translated-string respects the C syntax for a character string, including the surrounding quotes and embedded backslashed escape sequences. When the time comes to write multi-line strings, one should not use escaped newlines. Instead, a closing quote should follow the last character on the line to be continued, and an opening quote should resume the string at the beginning of the following PO file line. For example:

 
msgid ""
"Here is an example of how one might continue a very long string\n"
"for the common case the string represents multi-line output.\n"

In this example, the empty string is used on the first line, to allow better alignment of the H from the word `Here' over the f from the word `for'. In this example, the msgid keyword is followed by three strings, which are meant to be concatenated. Concatenating the empty string does not change the resulting overall string, but it is a way for us to comply with the necessity of msgid to be followed by a string on the same line, while keeping the multi-line presentation left-justified, as we find this to be a cleaner disposition. The empty string could have been omitted, but only if the string starting with `Here' was promoted on the first line, right after msgid.(2) It was not really necessary either to switch between the two last quoted strings immediately after the newline `\n', the switch could have occurred after any other character, we just did it this way because it is neater.

One should carefully distinguish between end of lines marked as `\n' inside quotes, which are part of the represented string, and end of lines in the PO file itself, outside string quotes, which have no incidence on the represented string.

Outside strings, white lines and comments may be used freely. Comments start at the beginning of a line with `#' and extend until the end of the PO file line. Comments written by translators should have the initial `#' immediately followed by some white space. If the `#' is not immediately followed by white space, this comment is most likely generated and managed by specialized GNU tools, and might disappear or be replaced unexpectedly when the PO file is given to msgmerge.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.3 Main PO mode Commands

After setting up Emacs with something similar to the lines in 2.1 Completing GNU gettext Installation, PO mode is activated for a window when Emacs finds a PO file in that window. This puts the window read-only and establishes a po-mode-map, which is a genuine Emacs mode, in a way that is not derived from text mode in any way. Functions found on po-mode-hook, if any, will be executed.

When PO mode is active in a window, the letters `PO' appear in the mode line for that window. The mode line also displays how many entries of each kind are held in the PO file. For example, the string `132t+3f+10u+2o' would tell the translator that the PO mode contains 132 translated entries (see section 6.2 Translated Entries, 3 fuzzy entries (see section 6.3 Fuzzy Entries), 10 untranslated entries (see section 6.4 Untranslated Entries) and 2 obsolete entries (see section 6.5 Obsolete Entries). Zero-coefficients items are not shown. So, in this example, if the fuzzy entries were unfuzzied, the untranslated entries were translated and the obsolete entries were deleted, the mode line would merely display `145t' for the counters.

The main PO commands are those which do not fit into the other categories of subsequent sections. These allow for quitting PO mode or for managing windows in special ways.

_
Undo last modification to the PO file (po-undo).

Q
Quit processing and save the PO file (po-quit).

q
Quit processing, possibly after confirmation (po-confirm-and-quit).

0
Temporary leave the PO file window (po-other-window).

?
h
Show help about PO mode (po-help).

=
Give some PO file statistics (po-statistics).

V
Batch validate the format of the whole PO file (po-validate).

The command _ (po-undo) interfaces to the Emacs undo facility. See section `Undoing Changes' in The Emacs Editor. Each time U is typed, modifications which the translator did to the PO file are undone a little more. For the purpose of undoing, each PO mode command is atomic. This is especially true for the RET command: the whole edition made by using a single use of this command is undone at once, even if the edition itself implied several actions. However, while in the editing window, one can undo the edition work quite parsimoniously.

The commands Q (po-quit) and q (po-confirm-and-quit) are used when the translator is done with the PO file. The former is a bit less verbose than the latter. If the file has been modified, it is saved to disk first. In both cases, and prior to all this, the commands check if any untranslated messages remain in the PO file and, if so, the translator is asked if she really wants to leave off working with this PO file. This is the preferred way of getting rid of an Emacs PO file buffer. Merely killing it through the usual command C-x k (kill-buffer) is not the tidiest way to proceed.

The command 0 (po-other-window) is another, softer way, to leave PO mode, temporarily. It just moves the cursor to some other Emacs window, and pops one if necessary. For example, if the translator just got PO mode to show some source context in some other, she might discover some apparent bug in the program source that needs correction. This command allows the translator to change sex, become a programmer, and have the cursor right into the window containing the program she (or rather he) wants to modify. By later getting the cursor back in the PO file window, or by asking Emacs to edit this file once again, PO mode is then recovered.

The command h (po-help) displays a summary of all available PO mode commands. The translator should then type any character to resume normal PO mode operations. The command ? has the same effect as h.

The command = (po-statistics) computes the total number of entries in the PO file, the ordinal of the current entry (counted from 1), the number of untranslated entries, the number of obsolete entries, and displays all these numbers.

The command V (po-validate) launches msgfmt in checking and verbose mode over the current PO file. This command first offers to save the current PO file on disk. The msgfmt tool, from GNU gettext, has the purpose of creating a MO file out of a PO file, and PO mode uses the features of this program for checking the overall format of a PO file, as well as all individual entries.

The program msgfmt runs asynchronously with Emacs, so the translator regains control immediately while her PO file is being studied. Error output is collected in the Emacs `*compilation*' buffer, displayed in another window. The regular Emacs command C-x` (next-error), as well as other usual compile commands, allow the translator to reposition quickly to the offending parts of the PO file. Once the cursor is on the line in error, the translator may decide on any PO mode action which would help correcting the error.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.4 Entry Positioning

The cursor in a PO file window is almost always part of an entry. The only exceptions are the special case when the cursor is after the last entry in the file, or when the PO file is empty. The entry where the cursor is found to be is said to be the current entry. Many PO mode commands operate on the current entry, so moving the cursor does more than allowing the translator to browse the PO file, this also selects on which entry commands operate.

Some PO mode commands alter the position of the cursor in a specialized way. A few of those special purpose positioning are described here, the others are described in following sections (for a complete list try C-h m):

.
Redisplay the current entry (po-current-entry).

n
Select the entry after the current one (po-next-entry).

p
Select the entry before the current one (po-previous-entry).

<
Select the first entry in the PO file (po-first-entry).

>
Select the last entry in the PO file (po-last-entry).

m
Record the location of the current entry for later use (po-push-location).

r
Return to a previously saved entry location (po-pop-location).

x
Exchange the current entry location with the previously saved one (po-exchange-location).

Any Emacs command able to reposition the cursor may be used to select the current entry in PO mode, including commands which move by characters, lines, paragraphs, screens or pages, and search commands. However, there is a kind of standard way to display the current entry in PO mode, which usual Emacs commands moving the cursor do not especially try to enforce. The command . (po-current-entry) has the sole purpose of redisplaying the current entry properly, after the current entry has been changed by means external to PO mode, or the Emacs screen otherwise altered.

It is yet to be decided if PO mode helps the translator, or otherwise irritates her, by forcing a rigid window disposition while she is doing her work. We originally had quite precise ideas about how windows should behave, but on the other hand, anyone used to Emacs is often happy to keep full control. Maybe a fixed window disposition might be offered as a PO mode option that the translator might activate or deactivate at will, so it could be offered on an experimental basis. If nobody feels a real need for using it, or a compulsion for writing it, we should drop this whole idea. The incentive for doing it should come from translators rather than programmers, as opinions from an experienced translator are surely more worth to me than opinions from programmers thinking about how others should do translation.

The commands n (po-next-entry) and p (po-previous-entry) move the cursor the entry following, or preceding, the current one. If n is given while the cursor is on the last entry of the PO file, or if p is given while the cursor is on the first entry, no move is done.

The commands < (po-first-entry) and > (po-last-entry) move the cursor to the first entry, or last entry, of the PO file. When the cursor is located past the last entry in a PO file, most PO mode commands will return an error saying `After last entry'. Moreover, the commands < and > have the special property of being able to work even when the cursor is not into some PO file entry, and one may use them for nicely correcting this situation. But even these commands will fail on a truly empty PO file. There are development plans for the PO mode for it to interactively fill an empty PO file from sources. See section 3.4 Marking Translatable Strings.

The translator may decide, before working at the translation of a particular entry, that she needs to browse the remainder of the PO file, maybe for finding the terminology or phraseology used in related entries. She can of course use the standard Emacs idioms for saving the current cursor location in some register, and use that register for getting back, or else, use the location ring.

PO mode offers another approach, by which cursor locations may be saved onto a special stack. The command m (po-push-location) merely adds the location of current entry to the stack, pushing the already saved locations under the new one. The command r (po-pop-location) consumes the top stack element and repositions the cursor to the entry associated with that top element. This position is then lost, for the next r will move the cursor to the previously saved location, and so on until no locations remain on the stack.

If the translator wants the position to be kept on the location stack, maybe for taking a look at the entry associated with the top element, then go elsewhere with the intent of getting back later, she ought to use m immediately after r.

The command x (po-exchange-location) simultaneously repositions the cursor to the entry associated with the top element of the stack of saved locations, and replaces that top element with the location of the current entry before the move. Consequently, repeating the x command toggles alternatively between two entries. For achieving this, the translator will position the cursor on the first entry, use m, then position to the second entry, and merely use x for making the switch.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.5 Normalizing Strings in Entries

There are many different ways for encoding a particular string into a PO file entry, because there are so many different ways to split and quote multi-line strings, and even, to represent special characters by backslashed escaped sequences. Some features of PO mode rely on the ability for PO mode to scan an already existing PO file for a particular string encoded into the msgid field of some entry. Even if PO mode has internally all the built-in machinery for implementing this recognition easily, doing it fast is technically difficult. To facilitate a solution to this efficiency problem, we decided on a canonical representation for strings.

A conventional representation of strings in a PO file is currently under discussion, and PO mode experiments with a canonical representation. Having both xgettext and PO mode converging towards a uniform way of representing equivalent strings would be useful, as the internal normalization needed by PO mode could be automatically satisfied when using xgettext from GNU gettext. An explicit PO mode normalization should then be only necessary for PO files imported from elsewhere, or for when the convention itself evolves.

So, for achieving normalization of at least the strings of a given PO file needing a canonical representation, the following PO mode command is available:

M-x po-normalize
Tidy the whole PO file by making entries more uniform.

The special command M-x po-normalize, which has no associated keys, revises all entries, ensuring that strings of both original and translated entries use uniform internal quoting in the PO file. It also removes any crumb after the last entry. This command may be useful for PO files freshly imported from elsewhere, or if we ever improve on the canonical quoting format we use. This canonical format is not only meant for getting cleaner PO files, but also for greatly speeding up msgid string lookup for some other PO mode commands.

M-x po-normalize presently makes three passes over the entries. The first implements heuristics for converting PO files for GNU gettext 0.6 and earlier, in which msgid and msgstr fields were using K&R style C string syntax for multi-line strings. These heuristics may fail for comments not related to obsolete entries and ending with a backslash; they also depend on subsequent passes for finalizing the proper commenting of continued lines for obsolete entries. This first pass might disappear once all oldish PO files would have been adjusted. The second and third pass normalize all msgid and msgstr strings respectively. They also clean out those trailing backslashes used by XView's msgfmt for continued lines.

Having such an explicit normalizing command allows for importing PO files from other sources, but also eases the evolution of the current convention, evolution driven mostly by aesthetic concerns, as of now. It is easy to make suggested adjustments at a later time, as the normalizing command and eventually, other GNU gettext tools should greatly automate conformance. A description of the canonical string format is given below, for the particular benefit of those not having Emacs handy, and who would nevertheless want to handcraft their PO files in nice ways.

Right now, in PO mode, strings are single line or multi-line. A string goes multi-line if and only if it has embedded newlines, that is, if it matches `[^\n]\n+[^\n]'. So, we would have:

 
msgstr "\n\nHello, world!\n\n\n"

but, replacing the space by a newline, this becomes:

 
msgstr ""
"\n"
"\n"
"Hello,\n"
"world!\n"
"\n"
"\n"

We are deliberately using a caricatural example, here, to make the point clearer. Usually, multi-lines are not that bad looking. It is probable that we will implement the following suggestion. We might lump together all initial newlines into the empty string, and also all newlines introducing empty lines (that is, for n > 1, the n-1'th last newlines would go together on a separate string), so making the previous example appear:

 
msgstr "\n\n"
"Hello,\n"
"world!\n"
"\n\n"

There are a few yet undecided little points about string normalization, to be documented in this manual, once these questions settle.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3. Preparing Program Sources

For the programmer, changes to the C source code fall into three categories. First, you have to make the localization functions known to all modules needing message translation. Second, you should properly trigger the operation of GNU gettext when the program initializes, usually from the main function. Last, you should identify and especially mark all constant strings in your program needing translation.

Presuming that your set of programs, or package, has been adjusted so all needed GNU gettext files are available, and your `Makefile' files are adjusted (see section 12. The Maintainer's View), each C module having translated C strings should contain the line:

 
#include <libintl.h>

The remaining changes to your C sources are discussed in the further sections of this chapter.

3.1 Triggering gettext Operations  
3.2 Preparing Translatable Strings  
3.3 How Marks Appear in Sources  
3.4 Marking Translatable Strings  
3.5 Special Comments preceding Keywords  Telling something about the following string
3.6 Special Cases of Translatable Strings  


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3.1 Triggering gettext Operations

The initialization of locale data should be done with more or less the same code in every program, as demonstrated below:

 
int
main (argc, argv)
     int argc;
     char argv;
{
  ...
  setlocale (LC_ALL, "");
  bindtextdomain (PACKAGE, LOCALEDIR);
  textdomain (PACKAGE);
  ...
}

PACKAGE and LOCALEDIR should be provided either by `config.h' or by the Makefile. For now consult the gettext or hello sources for more information.

The use of LC_ALL might not be appropriate for you. LC_ALL includes all locale categories and especially LC_CTYPE. This later category is responsible for determining character classes with the isalnum etc. functions from `ctype.h' which could especially for programs, which process some kind of input language, be wrong. For example this would mean that a source code using the ç (c-cedilla character) is runnable in France but not in the U.S.

Some systems also have problems with parsing numbers using the scanf functions if an other but the LC_ALL locale is used. The standards say that additional formats but the one known in the "C" locale might be recognized. But some systems seem to reject numbers in the "C" locale format. In some situation, it might also be a problem with the notation itself which makes it impossible to recognize whether the number is in the "C" locale or the local format. This can happen if thousands separator characters are used. Some locales define this character accordfing to the national conventions to '.' which is the same character used in the "C" locale to denote the decimal point.

So it is sometimes necessary to replace the LC_ALL line in the code above by a sequence of setlocale lines

 
{
  ...
  setlocale (LC_CTYPE, "");
  setlocale (LC_MESSAGES, "");
  ...
}

On all POSIX conformant systems the locale categories LC_CTYPE, LC_COLLATE, LC_MONETARY, LC_NUMERIC, and LC_TIME are available. On some modern systems there is also a locale LC_MESSAGES which is called on some old, XPG2 compliant systems LC_RESPONSES.

Note that changing the LC_CTYPE also affects the functions declared in the <ctype.h> standard header. If this is not desirable in your application (for example in a compiler's parser), you can use a set of substitute functions which hardwire the C locale, such as found in the <c-ctype.h> and <c-ctype.c> files in the gettext source distribution.

It is also possible to switch the locale forth and back between the environment dependent locale and the C locale, but this approach is normally avoided because a setlocale call is expensive, because it is tedious to determine the places where a locale switch is needed in a large program's source, and because switching a locale is not multithread-safe.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3.2 Preparing Translatable Strings

Before strings can be marked for translations, they sometimes need to be adjusted. Usually preparing a string for translation is done right before marking it, during the marking phase which is described in the next sections. What you have to keep in mind while doing that is the following.

Let's look at some examples of these guidelines.

Translatable strings should be in good English style. If slang language with abbreviations and shortcuts is used, often translators will not understand the message and will produce very inappropriate translations.

 
"%s: is parameter\n"

This is nearly untranslatable: Is the displayed item a parameter or the parameter?

 
"No match"

The ambiguity in this message makes it ununderstandable: Is the program attempting to set something on fire? Does it mean "The given object does not match the template"? Does it mean "The template does not fit for any of the objects"?

In both cases, adding more words to the message will help both the translator and the English speaking user.

Translatable strings should be entire sentences. It is often not possible to translate single verbs or adjectives in a substitutable way.

 
printf ("File %s is %s protected", filename, rw ? "write" : "read");

Most translators will not look at the source and will thus only see the string "File %s is %s protected", which is unintelligible. Change this to

 
printf (rw ? "File %s is write protected" : "File %s is read protected",
        filename);

This way the translator will not only understand the message, she will also be able to find the appropriate grammatical construction. The French translator for example translates "write protected" like "protected against writing".

Often sentences don't fit into a single line. If a sentence is output using two subsequent printf statements, like this

 
printf ("Locale charset \"%s\" is different from\n", lcharset);
printf ("input file charset \"%s\".\n", fcharset);

the translator would have to translate two half sentences, but nothing in the POT file would tell her that the two half sentences belong together. It is necessary to merge the two printf statements so that the translator can handle the entire sentence at once and decide at which place to insert a line break in the translation (if at all):

 
printf ("Locale charset \"%s\" is different from\n\
input file charset \"%s\".\n", lcharset, fcharset);

You may now ask: how about two or more adjacent sentences? Like in this case:

 
puts ("Apollo 13 scenario: Stack overflow handling failed.");
puts ("On the next stack overflow we will crash!!!");

Should these two statements merged into a single one? I would recommend to merge them if the two sentences are related to each other, because then it makes it easier for the translator to understand and translate both. On the other hand, if one of the two messages is a stereotypic one, occurring in other places as well, you will do a favour to the translator by not merging the two. (Identical messages occurring in several places are combined by xgettext, so the translator has to handle them once only.)

Translatable strings should be limited to one paragraph; don't let a single message be longer than ten lines. The reason is that when the translatable string changes, the translator is faced with the task of updating the entire translated string. Maybe only a single word will have changed in the English string, but the translator doesn't see that (with the current translation tools), therefore she has to proofread the entire message.

Many GNU programs have a `--help' output that extends over several screen pages. It is a courtesy towards the translators to split such a message into several ones of five to ten lines each. While doing that, you can also attempt to split the documented options into groups, such as the input options, the output options, and the informative output options. This will help every user to find the option he is looking for.

Hardcoded string concatenation is sometimes used to construct English strings:

 
strcpy (s, "Replace ");
strcat (s, object1);
strcat (s, " with ");
strcat (s, object2);
strcat (s, "?");

In order to present to the translator only entire sentences, and also because in some languages the translator might want to swap the order of object1 and object2, it is necessary to change this to use a format string:

 
sprintf (s, "Replace %s with %s?", object1, object2);

A similar case is compile time concatenation of strings. The ISO C 99 include file <inttypes.h> contains a macro PRId64 that can be used as a formatting directive for outputting an `int64_t' integer through printf. It expands to a constant string, usually "d" or "ld" or "lld" or something like this, depending on the platform. Assume you have code like

 
printf ("The amount is %0" PRId64 "\n"), number);

After marking, this cannot become

 
printf (gettext ("The amount is %0") PRId64 "\n"), number);

because it would simply be invalid C syntax. It cannot become

 
printf (gettext ("The amount is %0" PRId64 "\n")), number);

because the value of PRId64 is not known to xgettext, and even if were, there would be three or more possibilities, and the translator would have to translate three or more strings that differ in a single letter.

The solution for this problem is to change the code like this:

 
char buf1[100];
sprintf (buf1, "%0" PRId64, number);
printf (gettext ("The amount is %s\n"), buf1);

This means, you put the platform dependent code in one statement, and the internationalization code in a different statement. Note that a buffer length of 100 is safe, because all available hardware integer types are limited to 128 bits, and to print a 128 bit integer one needs at most 54 characters, regardless whether in decimal, octal or hexadecimal.

All this applies to other programming languages as well. For example, in Java, string contenation is very frequently used, because it is a compiler built-in operator. Like in C, in Java, you would change

 
System.out.println("Replace "+object1+" with "+object2+"?");

into a statement involving a format string:

 
System.out.println(
    MessageFormat.format("Replace {0} with {1}?",
                         new Object[] { object1, object2 }));


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3.3 How Marks Appear in Sources

All strings requiring translation should be marked in the C sources. Marking is done in such a way that each translatable string appears to be the sole argument of some function or preprocessor macro. There are only a few such possible functions or macros meant for translation, and their names are said to be marking keywords. The marking is attached to strings themselves, rather than to what we do with them. This approach has more uses. A blatant example is an error message produced by formatting. The format string needs translation, as well as some strings inserted through some `%s' specification in the format, while the result from sprintf may have so many different instances that it is impractical to list them all in some `error_string_out()' routine, say.

This marking operation has two goals. The first goal of marking is for triggering the retrieval of the translation, at run time. The keyword are possibly resolved into a routine able to dynamically return the proper translation, as far as possible or wanted, for the argument string. Most localizable strings are found in executable positions, that is, attached to variables or given as parameters to functions. But this is not universal usage, and some translatable strings appear in structured initializations. See section 3.6 Special Cases of Translatable Strings.

The second goal of the marking operation is to help xgettext at properly extracting all translatable strings when it scans a set of program sources and produces PO file templates.

The canonical keyword for marking translatable strings is `gettext', it gave its name to the whole GNU gettext package. For packages making only light use of the `gettext' keyword, macro or function, it is easily used as is. However, for packages using the gettext interface more heavily, it is usually more convenient to give the main keyword a shorter, less obtrusive name. Indeed, the keyword might appear on a lot of strings all over the package, and programmers usually do not want nor need their program sources to remind them forcefully, all the time, that they are internationalized. Further, a long keyword has the disadvantage of using more horizontal space, forcing more indentation work on sources for those trying to keep them within 79 or 80 columns.

Many packages use `_' (a simple underline) as a keyword, and write `_("Translatable string")' instead of `gettext ("Translatable string")'. Further, the coding rule, from GNU standards, wanting that there is a space between the keyword and the opening parenthesis is relaxed, in practice, for this particular usage. So, the textual overhead per translatable string is reduced to only three characters: the underline and the two parentheses. However, even if GNU gettext uses this convention internally, it does not offer it officially. The real, genuine keyword is truly `gettext' indeed. It is fairly easy for those wanting to use `_' instead of `gettext' to declare:

 
#include <libintl.h>
#define _(String) gettext (String)

instead of merely using `#include <libintl.h>'.

Later on, the maintenance is relatively easy. If, as a programmer, you add or modify a string, you will have to ask yourself if the new or altered string requires translation, and include it within `_()' if you think it should be translated. `"%s: %d"' is an example of string not requiring translation!


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3.4 Marking Translatable Strings

In PO mode, one set of features is meant more for the programmer than for the translator, and allows him to interactively mark which strings, in a set of program sources, are translatable, and which are not. Even if it is a fairly easy job for a programmer to find and mark such strings by other means, using any editor of his choice, PO mode makes this work more comfortable. Further, this gives translators who feel a little like programmers, or programmers who feel a little like translators, a tool letting them work at marking translatable strings in the program sources, while simultaneously producing a set of translation in some language, for the package being internationalized.

The set of program sources, targetted by the PO mode commands describe here, should have an Emacs tags table constructed for your project, prior to using these PO file commands. This is easy to do. In any shell window, change the directory to the root of your project, then execute a command resembling:

 
etags src/*.[hc] lib/*.[hc]

presuming here you want to process all `.h' and `.c' files from the `src/' and `lib/' directories. This command will explore all said files and create a `TAGS' file in your root directory, somewhat summarizing the contents using a special file format Emacs can understand.

For packages following the GNU coding standards, there is a make goal tags or TAGS which constructs the tag files in all directories and for all files containing source code.

Once your `TAGS' file is ready, the following commands assist the programmer at marking translatable strings in his set of sources. But these commands are necessarily driven from within a PO file window, and it is likely that you do not even have such a PO file yet. This is not a problem at all, as you may safely open a new, empty PO file, mainly for using these commands. This empty PO file will slowly fill in while you mark strings as translatable in your program sources.

,
Search through program sources for a string which looks like a candidate for translation (po-tags-search).

M-,
Mark the last string found with `_()' (po-mark-translatable).

M-.
Mark the last string found with a keyword taken from a set of possible keywords. This command with a prefix allows some management of these keywords (po-select-mark-and-mark).

The , (po-tags-search) command searches for the next occurrence of a string which looks like a possible candidate for translation, and displays the program source in another Emacs window, positioned in such a way that the string is near the top of this other window. If the string is too big to fit whole in this window, it is positioned so only its end is shown. In any case, the cursor is left in the PO file window. If the shown string would be better presented differently in different native languages, you may mark it using M-, or M-.. Otherwise, you might rather ignore it and skip to the next string by merely repeating the , command.

A string is a good candidate for translation if it contains a sequence of three or more letters. A string containing at most two letters in a row will be considered as a candidate if it has more letters than non-letters. The command disregards strings containing no letters, or isolated letters only. It also disregards strings within comments, or strings already marked with some keyword PO mode knows (see below).

If you have never told Emacs about some `TAGS' file to use, the command will request that you specify one from the minibuffer, the first time you use the command. You may later change your `TAGS' file by using the regular Emacs command M-x visit-tags-table, which will ask you to name the precise `TAGS' file you want to use. See section `Tag Tables' in The Emacs Editor.

Each time you use the , command, the search resumes from where it was left by the previous search, and goes through all program sources, obeying the `TAGS' file, until all sources have been processed. However, by giving a prefix argument to the command (C-u ,), you may request that the search be restarted all over again from the first program source; but in this case, strings that you recently marked as translatable will be automatically skipped.

Using this , command does not prevent using of other regular Emacs tags commands. For example, regular tags-search or tags-query-replace commands may be used without disrupting the independent , search sequence. However, as implemented, the initial , command (or the , command is used with a prefix) might also reinitialize the regular Emacs tags searching to the first tags file, this reinitialization might be considered spurious.

The M-, (po-mark-translatable) command will mark the recently found string with the `_' keyword. The M-. (po-select-mark-and-mark) command will request that you type one keyword from the minibuffer and use that keyword for marking the string. Both commands will automatically create a new PO file untranslated entry for the string being marked, and make it the current entry (maki