diff options
Diffstat (limited to 'doc')
-rw-r--r-- | doc/gettext_1.html | 660 | ||||
-rw-r--r-- | doc/gettext_10.html | 509 | ||||
-rw-r--r-- | doc/gettext_11.html | 707 | ||||
-rw-r--r-- | doc/gettext_12.html | 160 | ||||
-rw-r--r-- | doc/gettext_13.html | 523 | ||||
-rw-r--r-- | doc/gettext_14.html | 745 | ||||
-rw-r--r-- | doc/gettext_2.html | 685 | ||||
-rw-r--r-- | doc/gettext_3.html | 620 | ||||
-rw-r--r-- | doc/gettext_4.html | 208 | ||||
-rw-r--r-- | doc/gettext_5.html | 188 | ||||
-rw-r--r-- | doc/gettext_6.html | 930 | ||||
-rw-r--r-- | doc/gettext_7.html | 268 | ||||
-rw-r--r-- | doc/gettext_8.html | 119 | ||||
-rw-r--r-- | doc/gettext_9.html | 1410 | ||||
-rw-r--r-- | doc/gettext_foot.html | 42 | ||||
-rw-r--r-- | doc/gettext_toc.html | 146 |
16 files changed, 7920 insertions, 0 deletions
diff --git a/doc/gettext_1.html b/doc/gettext_1.html new file mode 100644 index 0000000..f7793fe --- /dev/null +++ b/doc/gettext_1.html @@ -0,0 +1,660 @@ +<HTML> +<HEAD> +<!-- This HTML file has been created by texi2html 1.51 + from gettext.texi on 19 April 2001 --> + +<TITLE>GNU gettext utilities - 1 Introduction</TITLE> +</HEAD> +<BODY> +Go to the first, previous, <A HREF="gettext_2.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +<P><HR><P> + + +<H1><A NAME="SEC1" HREF="gettext_toc.html#TOC1">1 Introduction</A></H1> + + +<BLOCKQUOTE> +<P> +This manual is still in <EM>DRAFT</EM> state. Some sections are still +empty, or almost. We keep merging material from other sources +(essentially e-mail folders) while the proper integration of this +material is delayed. +</BLOCKQUOTE> + +<P> +In this manual, we use <EM>he</EM> when speaking of the programmer or +maintainer, <EM>she</EM> when speaking of the translator, and <EM>they</EM> +when speaking of the installers or end users of the translated program. +This is only a convenience for clarifying the documentation. It is +<EM>absolutely</EM> not meant to imply that some roles are more appropriate +to males or females. Besides, as you might guess, GNU <CODE>gettext</CODE> +is meant to be useful for people using computers, whatever their sex, +race, religion or nationality! + +</P> +<P> +This chapter explains the goals sought in the creation +of GNU <CODE>gettext</CODE> and the free Translation Project. +Then, it explains a few broad concepts around +Native Language Support, and positions message translation with regard +to other aspects of national and cultural variance, as they apply to +to programs. It also surveys those files used to convey the +translations. It explains how the various tools interact in the +initial generation of these files, and later, how the maintenance +cycle should usually operate. + +</P> +<P> +Please send suggestions and corrections to: + +</P> + +<PRE> +Internet address: + bug-gnu-utils@gnu.org +</PRE> + +<P> +Please include the manual's edition number and update date in your messages. + +</P> + + + +<H2><A NAME="SEC2" HREF="gettext_toc.html#TOC2">1.1 The Purpose of GNU <CODE>gettext</CODE></A></H2> + +<P> +Usually, programs are written and documented in English, and use +English at execution time to interact with users. This is true +not only of GNU software, but also of a great deal of commercial +and free software. Using a common language is quite handy for +communication between developers, maintainers and users from all +countries. On the other hand, most people are less comfortable with +English than with their own native language, and would prefer to +use their mother tongue for day to day's work, as far as possible. +Many would simply <EM>love</EM> to see their computer screen showing +a lot less of English, and far more of their own language. + +</P> +<P> +However, to many people, this dream might appear so far fetched that +they may believe it is not even worth spending time thinking about +it. They have no confidence at all that the dream might ever +become true. Yet some have not lost hope, and have organized themselves. +The Translation Project is a formalization of this hope into a +workable structure, which has a good chance to get all of us nearer +the achievement of a truly multi-lingual set of programs. + +</P> +<P> +GNU <CODE>gettext</CODE> is an important step for the Translation Project, +as it is an asset on which we may build many other steps. This package +offers to programmers, translators and even users, a well integrated +set of tools and documentation. Specifically, the GNU <CODE>gettext</CODE> +utilities are a set of tools that provides a framework within which +other free packages may produce multi-lingual messages. These tools +include + +</P> + +<UL> +<LI> + +A set of conventions about how programs should be written to support +message catalogs. + +<LI> + +A directory and file naming organization for the message catalogs +themselves. + +<LI> + +A runtime library supporting the retrieval of translated messages. + +<LI> + +A few stand-alone programs to massage in various ways the sets of +translatable strings, or already translated strings. + +<LI> + +A special mode for Emacs<A NAME="DOCF1" HREF="gettext_foot.html#FOOT1">(1)</A> which helps preparing these sets +and bringing them up to date. +</UL> + +<P> +GNU <CODE>gettext</CODE> is designed to minimize the impact of +internationalization on program sources, keeping this impact as small +and hardly noticeable as possible. Internationalization has better +chances of succeeding if it is very light weighted, or at least, +appear to be so, when looking at program sources. + +</P> +<P> +The Translation Project also uses the GNU <CODE>gettext</CODE> distribution +as a vehicle for documenting its structure and methods. This goes +beyond the strict technicalities of documenting the GNU <CODE>gettext</CODE> +proper. By so doing, translators will find in a single place, as +far as possible, all they need to know for properly doing their +translating work. Also, this supplemental documentation might also +help programmers, and even curious users, in understanding how GNU +<CODE>gettext</CODE> is related to the remainder of the Translation +Project, and consequently, have a glimpse at the <EM>big picture</EM>. + +</P> + + +<H2><A NAME="SEC3" HREF="gettext_toc.html#TOC3">1.2 I18n, L10n, and Such</A></H2> + +<P> +Two long words appear all the time when we discuss support of native +language in programs, and these words have a precise meaning, worth +being explained here, once and for all in this document. The words are +<EM>internationalization</EM> and <EM>localization</EM>. Many people, +tired of writing these long words over and over again, took the +habit of writing <STRONG>i18n</STRONG> and <STRONG>l10n</STRONG> instead, quoting the first +and last letter of each word, and replacing the run of intermediate +letters by a number merely telling how many such letters there are. +But in this manual, in the sake of clarity, we will patiently write +the names in full, each time... + +</P> +<P> +By <STRONG>internationalization</STRONG>, one refers to the operation by which a +program, or a set of programs turned into a package, is made aware of and +able to support multiple languages. This is a generalization process, +by which the programs are untied from calling only English strings or +other English specific habits, and connected to generic ways of doing +the same, instead. Program developers may use various techniques to +internationalize their programs. Some of these have been standardized. +GNU <CODE>gettext</CODE> offers one of these standards. See section <A HREF="gettext_9.html#SEC41">9 The Programmer's View</A>. + +</P> +<P> +By <STRONG>localization</STRONG>, one means the operation by which, in a set +of programs already internationalized, one gives the program all +needed information so that it can adapt itself to handle its input +and output in a fashion which is correct for some native language and +cultural habits. This is a particularisation process, by which generic +methods already implemented in an internationalized program are used +in specific ways. The programming environment puts several functions +to the programmers disposal which allow this runtime configuration. +The formal description of specific set of cultural habits for some +country, together with all associated translations targeted to the +same native language, is called the <STRONG>locale</STRONG> for this language +or country. Users achieve localization of programs by setting proper +values to special environment variables, prior to executing those +programs, identifying which locale should be used. + +</P> +<P> +In fact, locale message support is only one component of the cultural +data that makes up a particular locale. There are a whole host of +routines and functions provided to aid programmers in developing +internationalized software and which allow them to access the data +stored in a particular locale. When someone presently refers to a +particular locale, they are obviously referring to the data stored +within that particular locale. Similarly, if a programmer is referring +to "accessing the locale routines", they are referring to the +complete suite of routines that access all of the locale's information. + +</P> +<P> +One uses the expression <STRONG>Native Language Support</STRONG>, or merely NLS, +for speaking of the overall activity or feature encompassing both +internationalization and localization, allowing for multi-lingual +interactions in a program. In a nutshell, one could say that +internationalization is the operation by which further localizations +are made possible. + +</P> +<P> +Also, very roughly said, when it comes to multi-lingual messages, +internationalization is usually taken care of by programmers, and +localization is usually taken care of by translators. + +</P> + + +<H2><A NAME="SEC4" HREF="gettext_toc.html#TOC4">1.3 Aspects in Native Language Support</A></H2> + +<P> +For a totally multi-lingual distribution, there are many things to +translate beyond output messages. + +</P> + +<UL> +<LI> + +As of today, GNU <CODE>gettext</CODE> offers a complete toolset for +translating messages output by C programs. Perl scripts and shell +scripts will also need to be translated. Even if there are today some hooks +by which this can be done, these hooks are not integrated as well as they +should be. + +<LI> + +Some programs, like <CODE>autoconf</CODE> or <CODE>bison</CODE>, are able +to produce other programs (or scripts). Even if the generating +programs themselves are internationalized, the generated programs they +produce may need internationalization on their own, and this indirect +internationalization could be automated right from the generating +program. In fact, quite usually, generating and generated programs +could be internationalized independently, as the effort needed is +fairly orthogonal. + +<LI> + +A few programs include textual tables which might need translation +themselves, independently of the strings contained in the program +itself. For example, RFC 1345 gives an English description for each +character which the <CODE>recode</CODE> program is able to reconstruct at execution. +Since these descriptions are extracted from the RFC by mechanical means, +translating them properly would require a prior translation of the RFC +itself. + +<LI> + +Almost all programs accept options, which are often worded out so to +be descriptive for the English readers; one might want to consider +offering translated versions for program options as well. + +<LI> + +Many programs read, interpret, compile, or are somewhat driven by +input files which are texts containing keywords, identifiers, or +replies which are inherently translatable. For example, one may want +<CODE>gcc</CODE> to allow diacriticized characters in identifiers or use +translated keywords; <SAMP>`rm -i'</SAMP> might accept something else than +<SAMP>`y'</SAMP> or <SAMP>`n'</SAMP> for replies, etc. Even if the program will +eventually make most of its output in the foreign languages, one has +to decide whether the input syntax, option values, etc., are to be +localized or not. + +<LI> + +The manual accompanying a package, as well as all documentation files +in the distribution, could surely be translated, too. Translating a +manual, with the intent of later keeping up with updates, is a major +undertaking in itself, generally. + +</UL> + +<P> +As we already stressed, translation is only one aspect of locales. +Other internationalization aspects are system services and are handled +in GNU <CODE>libc</CODE>. There +are many attributes that are needed to define a country's cultural +conventions. These attributes include beside the country's native +language, the formatting of the date and time, the representation of +numbers, the symbols for currency, etc. These local <STRONG>rules</STRONG> are +termed the country's locale. The locale represents the knowledge +needed to support the country's native attributes. + +</P> +<P> +There are a few major areas which may vary between countries and +hence, define what a locale must describe. The following list helps +putting multi-lingual messages into the proper context of other tasks +related to locales. See the GNU <CODE>libc</CODE> manual for details. + +</P> +<DL COMPACT> + +<DT><EM>Characters and Codesets</EM> +<DD> +The codeset most commonly used through out the USA and most English +speaking parts of the world is the ASCII codeset. However, there are +many characters needed by various locales that are not found within +this codeset. The 8-bit ISO 8859-1 code set has most of the special +characters needed to handle the major European languages. However, in +many cases, the ISO 8859-1 font is not adequate. Hence each locale +will need to specify which codeset they need to use and will need +to have the appropriate character handling routines to cope with +the codeset. + +<DT><EM>Currency</EM> +<DD> +The symbols used vary from country to country as does the position +used by the symbol. Software needs to be able to transparently +display currency figures in the native mode for each locale. + +<DT><EM>Dates</EM> +<DD> +The format of date varies between locales. For example, Christmas day +in 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia. +Other countries might use ISO 8061 dates, etc. + +Time of the day may be noted as <VAR>hh</VAR>:<VAR>mm</VAR>, <VAR>hh</VAR>.<VAR>mm</VAR>, +or otherwise. Some locales require time to be specified in 24-hour +mode rather than as AM or PM. Further, the nature and yearly extent +of the Daylight Saving correction vary widely between countries. + +<DT><EM>Numbers</EM> +<DD> +Numbers can be represented differently in different locales. +For example, the following numbers are all written correctly for +their respective locales: + + +<PRE> +12,345.67 English +12.345,67 French +1,2345.67 Asia +</PRE> + +Some programs could go further and use different unit systems, like +English units or Metric units, or even take into account variants +about how numbers are spelled in full. + +<DT><EM>Messages</EM> +<DD> +The most obvious area is the language support within a locale. This is +where GNU <CODE>gettext</CODE> provides the means for developers and users to +easily change the language that the software uses to communicate to +the user. + +</DL> + +<P> +Components of locale outside of message handling are standardized in +the ISO C standard and the SUSV2 specification. GNU <CODE>libc</CODE> +fully implements this, and most other modern systems provide a more +or less reasonable support for at least some of the missing components. + +</P> + + +<H2><A NAME="SEC5" HREF="gettext_toc.html#TOC5">1.4 Files Conveying Translations</A></H2> + +<P> +The letters PO in <TT>`.po'</TT> files means Portable Object, to +distinguish it from <TT>`.mo'</TT> files, where MO stands for Machine +Object. This paradigm, as well as the PO file format, is inspired +by the NLS standard developed by Uniforum, and implemented by Sun +in their Solaris system. + +</P> +<P> +PO files are meant to be read and edited by humans, and associate each +original, translatable string of a given package with its translation +in a particular target language. A single PO file is dedicated to +a single target language. If a package supports many languages, +there is one such PO file per language supported, and each package +has its own set of PO files. These PO files are best created by +the <CODE>xgettext</CODE> program, and later updated or refreshed through +the <CODE>msgmerge</CODE> program. Program <CODE>xgettext</CODE> extracts all +marked messages from a set of C files and initializes a PO file with +empty translations. Program <CODE>msgmerge</CODE> takes care of adjusting +PO files between releases of the corresponding sources, commenting +obsolete entries, initializing new ones, and updating all source +line references. Files ending with <TT>`.pot'</TT> are kind of base +translation files found in distributions, in PO file format, and +<TT>`.pox'</TT> files are often temporary PO files. + +</P> +<P> +MO files are meant to be read by programs, and are binary in nature. +A few systems already offer tools for creating and handling MO files +as part of the Native Language Support coming with the system, but the +format of these MO files is often different from system to system, +and non-portable. The tools already provided with these systems don't +support all the features of GNU <CODE>gettext</CODE>. Therefore GNU +<CODE>gettext</CODE> uses its own format for MO files. Files ending with +<TT>`.gmo'</TT> are really MO files, when it is known that these files use +the GNU format. + +</P> + + +<H2><A NAME="SEC6" HREF="gettext_toc.html#TOC6">1.5 Overview of GNU <CODE>gettext</CODE></A></H2> + +<P> +The following diagram summarizes the relation between the files +handled by GNU <CODE>gettext</CODE> and the tools acting on these files. +It is followed by a somewhat detailed explanations, which you should +read while keeping an eye on the diagram. Having a clear understanding +of these interrelations would surely help programmers, translators +and maintainers. + +</P> + +<PRE> +Original C Sources ---> PO mode ---> Marked C Sources ---. + | + .---------<--- GNU gettext Library | +.--- make <---+ | +| `---------<--------------------+-----------' +| | +| .-----<--- PACKAGE.pot <--- xgettext <---' .---<--- PO Compendium +| | | ^ +| | `---. | +| `---. +---> PO mode ---. +| +----> msgmerge ------> LANG.pox --->--------' | +| .---' | +| | | +| `-------------<---------------. | +| +--- LANG.po <--- New LANG.pox <----' +| .--- LANG.gmo <--- msgfmt <---' +| | +| `---> install ---> /.../LANG/PACKAGE.mo ---. +| +---> "Hello world!" +`-------> install ---> /.../bin/PROGRAM -------' +</PRE> + +<P> +The indication <SAMP>`PO mode'</SAMP> appears in two places in this picture, +and you may safely read it as merely meaning "hand editing", using +any editor of your choice, really. However, for those of you being +the lucky users of Emacs, PO mode has been specifically created +for providing a cozy environment for editing or modifying PO files. +While editing a PO file, PO mode allows for the easy browsing of +auxiliary and compendium PO files, as well as for following references into +the set of C program sources from which PO files have been derived. +It has a few special features, among which are the interactive marking +of program strings as translatable, and the validatation of PO files +with easy repositioning to PO file lines showing errors. + +</P> +<P> +As a programmer, the first step to bringing GNU <CODE>gettext</CODE> +into your package is identifying, right in the C sources, those strings +which are meant to be translatable, and those which are untranslatable. +This tedious job can be done a little more comfortably using emacs PO +mode, but you can use any means familiar to you for modifying your +C sources. Beside this some other simple, standard changes are needed to +properly initialize the translation library. See section <A HREF="gettext_3.html#SEC13">3 Preparing Program Sources</A>, for +more information about all this. + +</P> +<P> +For newly written software the strings of course can and should be +marked while writing it. The <CODE>gettext</CODE> approach makes this +very easy. Simply put the following lines at the beginning of each file +or in a central header file: + +</P> + +<PRE> +#define _(String) (String) +#define N_(String) (String) +#define textdomain(Domain) +#define bindtextdomain(Package, Directory) +</PRE> + +<P> +Doing this allows you to prepare the sources for internationalization. +Later when you feel ready for the step to use the <CODE>gettext</CODE> library +simply replace these definitions by the following: + +</P> + +<PRE> +#include <libintl.h> +#define _(String) gettext (String) +#define gettext_noop(String) (String) +#define N_(String) gettext_noop (String) +</PRE> + +<P> +and link against <TT>`libintl.a'</TT> or <TT>`libintl.so'</TT>. Note that on +GNU systems, you don't need to link with <CODE>libintl</CODE> because the +<CODE>gettext</CODE> library functions are already contained in GNU libc. +That is all you have to change. + +</P> +<P> +Once the C sources have been modified, the <CODE>xgettext</CODE> program +is used to find and extract all translatable strings, and create a +PO template file out of all these. This <TT>`<VAR>package</VAR>.pot'</TT> file +contains all original program strings. It has sets of pointers to +exactly where in C sources each string is used. All translations +are set to empty. The letter <KBD>t</KBD> in <TT>`.pot'</TT> marks this as +a Template PO file, not yet oriented towards any particular language. +See section <A HREF="gettext_4.html#SEC20">4.1 Invoking the <CODE>xgettext</CODE> Program</A>, for more details about how one calls the +<CODE>xgettext</CODE> program. If you are <EM>really</EM> lazy, you might +be interested at working a lot more right away, and preparing the +whole distribution setup (see section <A HREF="gettext_11.html#SEC72">11 The Maintainer's View</A>). By doing so, you +spare yourself typing the <CODE>xgettext</CODE> command, as <CODE>make</CODE> +should now generate the proper things automatically for you! + +</P> +<P> +The first time through, there is no <TT>`<VAR>lang</VAR>.po'</TT> yet, so the +<CODE>msgmerge</CODE> step may be skipped and replaced by a mere copy of +<TT>`<VAR>package</VAR>.pot'</TT> to <TT>`<VAR>lang</VAR>.pox'</TT>, where <VAR>lang</VAR> +represents the target language. + +</P> +<P> +Then comes the initial translation of messages. Translation in +itself is a whole matter, still exclusively meant for humans, +and whose complexity far overwhelms the level of this manual. +Nevertheless, a few hints are given in some other chapter of this +manual (see section <A HREF="gettext_10.html#SEC61">10 The Translator's View</A>). You will also find there indications +about how to contact translating teams, or becoming part of them, +for sharing your translating concerns with others who target the same +native language. + +</P> +<P> +While adding the translated messages into the <TT>`<VAR>lang</VAR>.pox'</TT> +PO file, if you do not have Emacs handy, you are on your own +for ensuring that your efforts fully respect the PO file format, and quoting +conventions (see section <A HREF="gettext_2.html#SEC9">2.2 The Format of PO Files</A>). This is surely not an impossible task, +as this is the way many people have handled PO files already for Uniforum or +Solaris. On the other hand, by using PO mode in Emacs, most details +of PO file format are taken care of for you, but you have to acquire +some familiarity with PO mode itself. Besides main PO mode commands +(see section <A HREF="gettext_2.html#SEC10">2.3 Main PO mode Commands</A>), you should know how to move between entries +(see section <A HREF="gettext_2.html#SEC11">2.4 Entry Positioning</A>), and how to handle untranslated entries +(see section <A HREF="gettext_6.html#SEC26">6.4 Untranslated Entries</A>). + +</P> +<P> +If some common translations have already been saved into a compendium +PO file, translators may use PO mode for initializing untranslated +entries from the compendium, and also save selected translations into +the compendium, updating it (see section <A HREF="gettext_6.html#SEC33">6.11 Using Translation Compendiums</A>). Compendium files +are meant to be exchanged between members of a given translation team. + +</P> +<P> +Programs, or packages of programs, are dynamic in nature: users write +bug reports and suggestion for improvements, maintainers react by +modifying programs in various ways. The fact that a package has +already been internationalized should not make maintainers shy +of adding new strings, or modifying strings already translated. +They just do their job the best they can. For the Translation +Project to work smoothly, it is important that maintainers do not +carry translation concerns on their already loaded shoulders, and that +translators be kept as free as possible of programmatic concerns. + +</P> +<P> +The only concern maintainers should have is carefully marking new +strings as translatable, when they should be, and do not otherwise +worry about them being translated, as this will come in proper time. +Consequently, when programs and their strings are adjusted in various +ways by maintainers, and for matters usually unrelated to translation, +<CODE>xgettext</CODE> would construct <TT>`<VAR>package</VAR>.pot'</TT> files which are +evolving over time, so the translations carried by <TT>`<VAR>lang</VAR>.po'</TT> +are slowly fading out of date. + +</P> +<P> +It is important for translators (and even maintainers) to understand +that package translation is a continuous process in the lifetime of a +package, and not something which is done once and for all at the start. +After an initial burst of translation activity for a given package, +interventions are needed once in a while, because here and there, +translated entries become obsolete, and new untranslated entries +appear, needing translation. + +</P> +<P> +The <CODE>msgmerge</CODE> program has the purpose of refreshing an already +existing <TT>`<VAR>lang</VAR>.po'</TT> file, by comparing it with a newer +<TT>`<VAR>package</VAR>.pot'</TT> template file, extracted by <CODE>xgettext</CODE> +out of recent C sources. The refreshing operation adjusts all +references to C source locations for strings, since these strings +move as programs are modified. Also, <CODE>msgmerge</CODE> comments out as +obsolete, in <TT>`<VAR>lang</VAR>.pox'</TT>, those already translated entries +which are no longer used in the program sources (see section <A HREF="gettext_6.html#SEC27">6.5 Obsolete Entries</A>). It finally discovers new strings and inserts them in +the resulting PO file as untranslated entries (see section <A HREF="gettext_6.html#SEC26">6.4 Untranslated Entries</A>). See section <A HREF="gettext_6.html#SEC23">6.1 Invoking the <CODE>msgmerge</CODE> Program</A>, for more information about what +<CODE>msgmerge</CODE> really does. + +</P> +<P> +Whatever route or means taken, the goal is to obtain an updated +<TT>`<VAR>lang</VAR>.pox'</TT> file offering translations for all strings. +When this is properly achieved, this file <TT>`<VAR>lang</VAR>.pox'</TT> may +take the place of the previous official <TT>`<VAR>lang</VAR>.po'</TT> file. + +</P> +<P> +The temporal mobility, or fluidity of PO files, is an integral part of +the translation game, and should be well understood, and accepted. +People resisting it will have a hard time participating in the +Translation Project, or will give a hard time to other participants! In +particular, maintainers should relax and include all available official +PO files in their distributions, even if these have not recently been +updated, without banging or otherwise trying to exert pressure on the +translator teams to get the job done. The pressure should rather come +from the community of users speaking a particular language, and +maintainers should consider themselves fairly relieved of any concern +about the adequacy of translation files. On the other hand, translators +should reasonably try updating the PO files they are responsible for, +while the package is undergoing pretest, prior to an official +distribution. + +</P> +<P> +Once the PO file is complete and dependable, the <CODE>msgfmt</CODE> program +is used for turning the PO file into a machine-oriented format, which +may yield efficient retrieval of translations by the programs of the +package, whenever needed at runtime (see section <A HREF="gettext_7.html#SEC36">7.2 The Format of GNU MO Files</A>). See section <A HREF="gettext_7.html#SEC35">7.1 Invoking the <CODE>msgfmt</CODE> Program</A>, for more information about all modalities of execution +for the <CODE>msgfmt</CODE> program. + +</P> +<P> +Finally, the modified and marked C sources are compiled and linked +with the GNU <CODE>gettext</CODE> library, usually through the operation of +<CODE>make</CODE>, given a suitable <TT>`Makefile'</TT> exists for the project, +and the resulting executable is installed somewhere users will find it. +The MO files themselves should also be properly installed. Given the +appropriate environment variables are set (see section <A HREF="gettext_8.html#SEC40">8.3 Magic for End Users</A>), the +program should localize itself automatically, whenever it executes. + +</P> +<P> +The remainder of this manual has the purpose of explaining in depth the various +steps outlined above. + +</P> +<P><HR><P> +Go to the first, previous, <A HREF="gettext_2.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +</BODY> +</HTML> diff --git a/doc/gettext_10.html b/doc/gettext_10.html new file mode 100644 index 0000000..1129441 --- /dev/null +++ b/doc/gettext_10.html @@ -0,0 +1,509 @@ +<HTML> +<HEAD> +<!-- This HTML file has been created by texi2html 1.51 + from gettext.texi on 19 April 2001 --> + +<TITLE>GNU gettext utilities - 10 The Translator's View</TITLE> +</HEAD> +<BODY> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_9.html">previous</A>, <A HREF="gettext_11.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +<P><HR><P> + + +<H1><A NAME="SEC61" HREF="gettext_toc.html#TOC61">10 The Translator's View</A></H1> + + + +<H2><A NAME="SEC62" HREF="gettext_toc.html#TOC62">10.1 Introduction 0</A></H2> + +<P> +Free software is going international! The Translation Project is a way +to get maintainers, translators and users all together, so free software +will gradually become able to speak many native languages. + +</P> +<P> +The GNU <CODE>gettext</CODE> tool set contains <EM>everything</EM> maintainers +need for internationalizing their packages for messages. It also +contains quite useful tools for helping translators at localizing +messages to their native language, once a package has already been +internationalized. + +</P> +<P> +To achieve the Translation Project, we need many interested +people who like their own language and write it well, and who are also +able to synergize with other translators speaking the same language. +If you'd like to volunteer to <EM>work</EM> at translating messages, +please send mail to your translating team. + +</P> +<P> +Each team has its own mailing list, courtesy of Linux +International. You may reach your translating team at the address +<TT>`<VAR>ll</VAR>@li.org'</TT>, replacing <VAR>ll</VAR> by the two-letter ISO 639 +code for your language. Language codes are <EM>not</EM> the same as +country codes given in ISO 3166. The following translating teams +exist: + +</P> + +<BLOCKQUOTE> +<P> +Chinese <CODE>zh</CODE>, Czech <CODE>cs</CODE>, Danish <CODE>da</CODE>, Dutch <CODE>nl</CODE>, +Esperanto <CODE>eo</CODE>, Finnish <CODE>fi</CODE>, French <CODE>fr</CODE>, Irish +<CODE>ga</CODE>, German <CODE>de</CODE>, Greek <CODE>el</CODE>, Italian <CODE>it</CODE>, +Japanese <CODE>ja</CODE>, Indonesian <CODE>in</CODE>, Norwegian <CODE>no</CODE>, Polish +<CODE>pl</CODE>, Portuguese <CODE>pt</CODE>, Russian <CODE>ru</CODE>, Spanish <CODE>es</CODE>, +Swedish <CODE>sv</CODE> and Turkish <CODE>tr</CODE>. +</BLOCKQUOTE> + +<P> +For example, you may reach the Chinese translating team by writing to +<TT>`zh@li.org'</TT>. When you become a member of the translating team +for your own language, you may subscribe to its list. For example, +Swedish people can send a message to <TT>`sv-request@li.org'</TT>, +having this message body: + +</P> + +<PRE> +subscribe +</PRE> + +<P> +Keep in mind that team members should be interested in <EM>working</EM> +at translations, or at solving translational difficulties, rather than +merely lurking around. If your team does not exist yet and you want to +start one, please write to <TT>`translation@iro.umontreal.ca'</TT>; +you will then reach the coordinator for all translator teams. + +</P> +<P> +A handful of GNU packages have already been adapted and provided +with message translations for several languages. Translation +teams have begun to organize, using these packages as a starting +point. But there are many more packages and many languages for +which we have no volunteer translators. If you would like to +volunteer to work at translating messages, please send mail to +<TT>`translation@iro.umontreal.ca'</TT> indicating what language(s) +you can work on. + +</P> + + +<H2><A NAME="SEC63" HREF="gettext_toc.html#TOC63">10.2 Introduction 1</A></H2> + +<P> +This is now official, GNU is going international! Here is the +announcement submitted for the January 1995 GNU Bulletin: + +</P> + +<BLOCKQUOTE> +<P> +A handful of GNU packages have already been adapted and provided +with message translations for several languages. Translation +teams have begun to organize, using these packages as a starting +point. But there are many more packages and many languages +for which we have no volunteer translators. If you'd like to +volunteer to work at translating messages, please send mail to +<SAMP>`translation@iro.umontreal.ca'</SAMP> indicating what language(s) +you can work on. +</BLOCKQUOTE> + +<P> +This document should answer many questions for those who are curious about +the process or would like to contribute. Please at least skim over it, +hoping to cut down a little of the high volume of e-mail generated by this +collective effort towards internationalization of free software. + +</P> +<P> +Most free programming which is widely shared is done in English, and +currently, English is used as the main communicating language between +national communities collaborating to free software. This very document +is written in English. This will not change in the foreseeable future. + +</P> +<P> +However, there is a strong appetite from national communities for +having more software able to write using national language and habits, +and there is an on-going effort to modify free software in such a way +that it becomes able to do so. The experiments driven so far raised +an enthusiastic response from pretesters, so we believe that +internationalization of free software is dedicated to succeed. + +</P> +<P> +For suggestion clarifications, additions or corrections to this +document, please e-mail to <TT>`translation@iro.umontreal.ca'</TT>. + +</P> + + +<H2><A NAME="SEC64" HREF="gettext_toc.html#TOC64">10.3 Discussions</A></H2> + +<P> +Facing this internationalization effort, a few users expressed their +concerns. Some of these doubts are presented and discussed, here. + +</P> + +<UL> +<LI>Smaller groups + +Some languages are not spoken by a very large number of people, so people +speaking them sometimes consider that there may not be all that much +demand such versions of free software packages. Moreover, many people +being <EM>into computers</EM>, in some countries, generally seem to prefer +English versions of their software. + +On the other end, people might enjoy their own language a lot, and be +very motivated at providing to themselves the pleasure of having their +beloved free software speaking their mother tongue. They do themselves +a personal favor, and do not pay that much attention to the number of +people beneficiating of their work. + +<LI>Misinterpretation + +Other users are shy to push forward their own language, seeing in this +some kind of misplaced propaganda. Someone thought there must be some +users of the language over the networks pestering other people with it. + +But any spoken language is worth localization, because there are +people behind the language for whom the language is important and +dear to their hearts. + +<LI>Odd translations + +The biggest problem is to find the right translations so that +everybody can understand the messages. Translations are usually a +little odd. Some people get used to English, to the extent they may +find translations into their own language "rather pushy, obnoxious +and sometimes even hilarious." As a French speaking man, I have +the experience of those instruction manuals for goods, so poorly +translated in French in Korea or Taiwan... + +The fact is that we sometimes have to create a kind of national +computer culture, and this is not easy without the collaboration of +many people liking their mother tongue. This is why translations are +better achieved by people knowing and loving their own language, and +ready to work together at improving the results they obtain. + +<LI>Dependencies over the GPL + +Some people wonder if using GNU <CODE>gettext</CODE> necessarily brings their package +under the protective wing of the GNU General Public License, when they +do not want to make their program free, or want other kinds of freedom. +The simplest answer is yes. + +The mere marking of localizable strings in a package, or conditional +inclusion of a few lines for initialization, is not really including +GPL'ed code. However, the localization routines themselves are under +the GPL and would bring the remainder of the package under the GPL +if they were distributed with it. So, I presume that, for those +for which this is a problem, it could be circumvented by letting to +the end installers the burden of assembling a package prepared for +localization, but not providing the localization routines themselves. + +</UL> + + + +<H2><A NAME="SEC65" HREF="gettext_toc.html#TOC65">10.4 Organization</A></H2> + +<P> +On a larger scale, the true solution would be to organize some kind of +fairly precise set up in which volunteers could participate. I gave +some thought to this idea lately, and realize there will be some +touchy points. I thought of writing to Richard Stallman to launch +such a project, but feel it might be good to shake out the ideas +between ourselves first. Most probably that Linux International has +some experience in the field already, or would like to orchestrate +the volunteer work, maybe. Food for thought, in any case! + +</P> +<P> +I guess we have to setup something early, somehow, that will help +many possible contributors of the same language to interlock and avoid +work duplication, and further be put in contact for solving together +problems particular to their tongue (in most languages, there are many +difficulties peculiar to translating technical English). My Swedish +contributor acknowledged these difficulties, and I'm well aware of +them for French. + +</P> +<P> +This is surely not a technical issue, but we should manage so the +effort of locale contributors be maximally useful, despite the national +team layer interface between contributors and maintainers. + +</P> +<P> +The Translation Project needs some setup for coordinating language +coordinators. Localizing evolving programs will surely +become a permanent and continuous activity in the free software community, +once well started. +The setup should be minimally completed and tested before GNU +<CODE>gettext</CODE> becomes an official reality. The e-mail address +<TT>`translation@iro.umontreal.ca'</TT> has been setup for receiving +offers from volunteers and general e-mail on these topics. This address +reaches the Translation Project coordinator. + +</P> + + + +<H3><A NAME="SEC66" HREF="gettext_toc.html#TOC66">10.4.1 Central Coordination</A></H3> + +<P> +I also think GNU will need sooner than it thinks, that someone setup +a way to organize and coordinate these groups. Some kind of group +of groups. My opinion is that it would be good that GNU delegates +this task to a small group of collaborating volunteers, shortly. +Perhaps in <TT>`gnu.announce'</TT> a list of this national committee's +can be published. + +</P> +<P> +My role as coordinator would simply be to refer to Ulrich any German +speaking volunteer interested to localization of free software packages, and +maybe helping national groups to initially organize, while maintaining +national registries for until national groups are ready to take over. +In fact, the coordinator should ease volunteers to get in contact with +one another for creating national teams, which should then select +one coordinator per language, or country (regionalized language). +If well done, the coordination should be useful without being an +overwhelming task, the time to put delegations in place. + +</P> + + +<H3><A NAME="SEC67" HREF="gettext_toc.html#TOC67">10.4.2 National Teams</A></H3> + +<P> +I suggest we look for volunteer coordinators/editors for individual +languages. These people will scan contributions of translation files +for various programs, for their own languages, and will ensure high +and uniform standards of diction. + +</P> +<P> +From my current experience with other people in these days, those who +provide localizations are very enthusiastic about the process, and are +more interested in the localization process than in the program they +localize, and want to do many programs, not just one. This seems +to confirm that having a coordinator/editor for each language is a +good idea. + +</P> +<P> +We need to choose someone who is good at writing clear and concise +prose in the language in question. That is hard--we can't check +it ourselves. So we need to ask a few people to judge each others' +writing and select the one who is best. + +</P> +<P> +I announce my prerelease to a few dozen people, and you would not +believe all the discussions it generated already. I shudder to think +what will happen when this will be launched, for true, officially, +world wide. Who am I to arbitrate between two Czekolsovak users +contradicting each other, for example? + +</P> +<P> +I assume that your German is not much better than my French so that +I would not be able to judge about these formulations. What I would +suggest is that for each language there is a group for people who +maintain the PO files and judge about changes. I suspect there will +be cultural differences between how such groups of people will behave. +Some will have relaxed ways, reach consensus easily, and have anyone +of the group relate to the maintainers, while others will fight to +death, organize heavy administrations up to national standards, and +use strict channels. + +</P> +<P> +The German team is putting out a good example. Right now, they are +maybe half a dozen people revising translations of each other and +discussing the linguistic issues. I do not even have all the names. +Ulrich Drepper is taking care of coordinating the German team. +He subscribed to all my pretest lists, so I do not even have to warn +him specifically of incoming releases. + +</P> +<P> +I'm sure, that is a good idea to get teams for each language working +on translations. That will make the translations better and more +consistent. + +</P> + + + +<H4><A NAME="SEC68" HREF="gettext_toc.html#TOC68">10.4.2.1 Sub-Cultures</A></H4> + +<P> +Taking French for example, there are a few sub-cultures around computers +which developed diverging vocabularies. Picking volunteers here and +there without addressing this problem in an organized way, soon in the +project, might produce a distasteful mix of internationalized programs, +and possibly trigger endless quarrels among those who really care. + +</P> +<P> +Keeping some kind of unity in the way French localization of +internationalized programs is achieved is a difficult (and delicate) job. +Knowing the latin character of French people (:-), if we take this +the wrong way, we could end up nowhere, or spoil a lot of energies. +Maybe we should begin to address this problem seriously <EM>before</EM> +GNU <CODE>gettext</CODE> become officially published. And I suspect that this +means soon! + +</P> + + +<H4><A NAME="SEC69" HREF="gettext_toc.html#TOC69">10.4.2.2 Organizational Ideas</A></H4> + +<P> +I expect the next big changes after the official release. Please note +that I use the German translation of the short GPL message. We need +to set a few good examples before the localization goes out for true +in the free software community. Here are a few points to discuss: + +</P> + +<UL> +<LI> + +Each group should have one FTP server (at least one master). + +<LI> + +The files on the server should reflect the latest version (of +course!) and it should also contain a RCS directory with the +corresponding archives (I don't have this now). + +<LI> + +There should also be a ChangeLog file (this is more useful than the +RCS archive but can be generated automatically from the later by +Emacs). + +<LI> + +A <STRONG>core group</STRONG> should judge about questionable changes (for now +this group consists solely by me but I ask some others occasionally; +this also seems to work). + +</UL> + + + +<H3><A NAME="SEC70" HREF="gettext_toc.html#TOC70">10.4.3 Mailing Lists</A></H3> + +<P> +If we get any inquiries about GNU <CODE>gettext</CODE>, send them on to: + +</P> + +<PRE> +<TT>`translation@iro.umontreal.ca'</TT> +</PRE> + +<P> +The <TT>`*-pretest'</TT> lists are quite useful to me, maybe the idea could +be generalized to many GNU, and non-GNU packages. But each maintainer +his/her way! + +</P> +<P> +Fran@,{c}ois, we have a mechanism in place here at +<TT>`gnu.ai.mit.edu'</TT> to track teams, support mailing lists for +them and log members. We have a slight preference that you use it. +If this is OK with you, I can get you clued in. + +</P> +<P> +Things are changing! A few years ago, when Daniel Fekete and I +asked for a mailing list for GNU localization, nested at the FSF, we +were politely invited to organize it anywhere else, and so did we. +For communicating with my pretesters, I later made a handful of +mailing lists located at iro.umontreal.ca and administrated by +<CODE>majordomo</CODE>. These lists have been <EM>very</EM> dependable +so far... + +</P> +<P> +I suspect that the German team will organize itself a mailing list +located in Germany, and so forth for other countries. But before they +organize for true, it could surely be useful to offer mailing lists +located at the FSF to each national team. So yes, please explain me +how I should proceed to create and handle them. + +</P> +<P> +We should create temporary mailing lists, one per country, to help +people organize. Temporary, because once regrouped and structured, it +would be fair the volunteers from country bring back <EM>their</EM> list +in there and manage it as they want. My feeling is that, in the long +run, each team should run its own list, from within their country. +There also should be some central list to which all teams could +subscribe as they see fit, as long as each team is represented in it. + +</P> + + +<H2><A NAME="SEC71" HREF="gettext_toc.html#TOC71">10.5 Information Flow</A></H2> + +<P> +There will surely be some discussion about this messages after the +packages are finally released. If people now send you some proposals +for better messages, how do you proceed? Jim, please note that +right now, as I put forward nearly a dozen of localizable programs, I +receive both the translations and the coordination concerns about them. + +</P> +<P> +If I put one of my things to pretest, Ulrich receives the announcement +and passes it on to the German team, who make last minute revisions. +Then he submits the translation files to me <EM>as the maintainer</EM>. +For free packages I do not maintain, I would not even hear about it. +This scheme could be made to work for the whole Translation Project, +I think. For security reasons, maybe Ulrich (national coordinators, +in fact) should update central registry kept at the Translation Project +(Jim, me, or Len's recruits) once in a while. + +</P> +<P> +In December/January, I was aggressively ready to internationalize +all of GNU, giving myself the duty of one small GNU package per week +or so, taking many weeks or months for bigger packages. But it does +not work this way. I first did all the things I'm responsible for. +I've nothing against some missionary work on other maintainers, but +I'm also loosing a lot of energy over it--same debates over again. + +</P> +<P> +And when the first localized packages are released we'll get a lot of +responses about ugly translations :-). Surely, and we need to have +beforehand a fairly good idea about how to handle the information +flow between the national teams and the package maintainers. + +</P> +<P> +Please start saving somewhere a quick history of each PO file. I know +for sure that the file format will change, allowing for comments. +It would be nice that each file has a kind of log, and references for +those who want to submit comments or gripes, or otherwise contribute. +I sent a proposal for a fast and flexible format, but it is not +receiving acceptance yet by the GNU deciders. I'll tell you when I +have more information about this. + +</P> +<P><HR><P> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_9.html">previous</A>, <A HREF="gettext_11.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +</BODY> +</HTML> diff --git a/doc/gettext_11.html b/doc/gettext_11.html new file mode 100644 index 0000000..ea125d0 --- /dev/null +++ b/doc/gettext_11.html @@ -0,0 +1,707 @@ +<HTML> +<HEAD> +<!-- This HTML file has been created by texi2html 1.51 + from gettext.texi on 19 April 2001 --> + +<TITLE>GNU gettext utilities - 11 The Maintainer's View</TITLE> +</HEAD> +<BODY> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_10.html">previous</A>, <A HREF="gettext_12.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +<P><HR><P> + + +<H1><A NAME="SEC72" HREF="gettext_toc.html#TOC72">11 The Maintainer's View</A></H1> + +<P> +The maintainer of a package has many responsibilities. One of them +is ensuring that the package will install easily on many platforms, +and that the magic we described earlier (see section <A HREF="gettext_8.html#SEC37">8 The User's View</A>) will work +for installers and end users. + +</P> +<P> +Of course, there are many possible ways by which GNU <CODE>gettext</CODE> +might be integrated in a distribution, and this chapter does not cover +them in all generality. Instead, it details one possible approach which +is especially adequate for many free software distributions following GNU +standards, or even better, Gnits standards, because GNU <CODE>gettext</CODE> +is purposely for helping the internationalization of the whole GNU +project, and as many other good free packages as possible. So, the +maintainer's view presented here presumes that the package already has +a <TT>`configure.in'</TT> file and uses GNU Autoconf. + +</P> +<P> +Nevertheless, GNU <CODE>gettext</CODE> may surely be useful for free packages +not following GNU standards and conventions, but the maintainers of such +packages might have to show imagination and initiative in organizing +their distributions so <CODE>gettext</CODE> work for them in all situations. +There are surely many, out there. + +</P> +<P> +Even if <CODE>gettext</CODE> methods are now stabilizing, slight adjustments +might be needed between successive <CODE>gettext</CODE> versions, so you +should ideally revise this chapter in subsequent releases, looking +for changes. + +</P> + + + +<H2><A NAME="SEC73" HREF="gettext_toc.html#TOC73">11.1 Flat or Non-Flat Directory Structures</A></H2> + +<P> +Some free software packages are distributed as <CODE>tar</CODE> files which unpack +in a single directory, these are said to be <STRONG>flat</STRONG> distributions. +Other free software packages have a one level hierarchy of subdirectories, using +for example a subdirectory named <TT>`doc/'</TT> for the Texinfo manual and +man pages, another called <TT>`lib/'</TT> for holding functions meant to +replace or complement C libraries, and a subdirectory <TT>`src/'</TT> for +holding the proper sources for the package. These other distributions +are said to be <STRONG>non-flat</STRONG>. + +</P> +<P> +We cannot say much about flat distributions. A flat +directory structure has the disadvantage of increasing the difficulty +of updating to a new version of GNU <CODE>gettext</CODE>. Also, if you have +many PO files, this could somewhat pollute your single directory. +Also, GNU <CODE>gettext</CODE>'s libintl sources consist of C sources, shell +scripts, <CODE>sed</CODE> scripts and complicated Makefile rules, which don't +fit well into an existing flat structure. For these reasons, we +recommend to use non-flat approach in this case as well. + +</P> +<P> +Maybe because GNU <CODE>gettext</CODE> itself has a non-flat structure, +we have more experience with this approach, and this is what will be +described in the remaining of this chapter. Some maintainers might +use this as an opportunity to unflatten their package structure. + +</P> + + +<H2><A NAME="SEC74" HREF="gettext_toc.html#TOC74">11.2 Prerequisite Works</A></H2> + +<P> +There are some works which are required for using GNU <CODE>gettext</CODE> +in one of your package. These works have some kind of generality +that escape the point by point descriptions used in the remainder +of this chapter. So, we describe them here. + +</P> + +<UL> +<LI> + +Before attempting to use <CODE>gettextize</CODE> you should install some +other packages first. +Ensure that recent versions of GNU <CODE>m4</CODE>, GNU Autoconf and GNU +<CODE>gettext</CODE> are already installed at your site, and if not, proceed +to do this first. If you got to install these things, beware that +GNU <CODE>m4</CODE> must be fully installed before GNU Autoconf is even +<EM>configured</EM>. + +To further ease the task of a package maintainer the <CODE>automake</CODE> +package was designed and implemented. GNU <CODE>gettext</CODE> now uses this +tool and the <TT>`Makefile'</TT>s in the <TT>`intl/'</TT> and <TT>`po/'</TT> +therefore know about all the goals necessary for using <CODE>automake</CODE> +and <TT>`libintl'</TT> in one project. + +Those four packages are only needed to you, as a maintainer; the +installers of your own package and end users do not really need any of +GNU <CODE>m4</CODE>, GNU Autoconf, GNU <CODE>gettext</CODE>, or GNU <CODE>automake</CODE> +for successfully installing and running your package, with messages +properly translated. But this is not completely true if you provide +internationalized shell scripts within your own package: GNU +<CODE>gettext</CODE> shall then be installed at the user site if the end users +want to see the translation of shell script messages. + +<LI> + +Your package should use Autoconf and have a <TT>`configure.in'</TT> file. +If it does not, you have to learn how. The Autoconf documentation +is quite well written, it is a good idea that you print it and get +familiar with it. + +<LI> + +Your C sources should have already been modified according to +instructions given earlier in this manual. See section <A HREF="gettext_3.html#SEC13">3 Preparing Program Sources</A>. + +<LI> + +Your <TT>`po/'</TT> directory should receive all PO files submitted to you +by the translator teams, each having <TT>`<VAR>ll</VAR>.po'</TT> as a name. +This is not usually easy to get translation +work done before your package gets internationalized and available! +Since the cycle has to start somewhere, the easiest for the maintainer +is to start with absolutely no PO files, and wait until various +translator teams get interested in your package, and submit PO files. + +</UL> + +<P> +It is worth adding here a few words about how the maintainer should +ideally behave with PO files submissions. As a maintainer, your role is +to authentify the origin of the submission as being the representative +of the appropriate translating teams of the Translation Project (forward +the submission to <TT>`translation@iro.umontreal.ca'</TT> in case of doubt), +to ensure that the PO file format is not severely broken and does not +prevent successful installation, and for the rest, to merely to put these +PO files in <TT>`po/'</TT> for distribution. + +</P> +<P> +As a maintainer, you do not have to take on your shoulders the +responsibility of checking if the translations are adequate or +complete, and should avoid diving into linguistic matters. Translation +teams drive themselves and are fully responsible of their linguistic +choices for the Translation Project. Keep in mind that translator teams are <EM>not</EM> +driven by maintainers. You can help by carefully redirecting all +communications and reports from users about linguistic matters to the +appropriate translation team, or explain users how to reach or join +their team. The simplest might be to send them the <TT>`ABOUT-NLS'</TT> file. + +</P> +<P> +Maintainers should <EM>never ever</EM> apply PO file bug reports +themselves, short-cutting translation teams. If some translator has +difficulty to get some of her points through her team, it should not be +an issue for her to directly negotiate translations with maintainers. +Teams ought to settle their problems themselves, if any. If you, as +a maintainer, ever think there is a real problem with a team, please +never try to <EM>solve</EM> a team's problem on your own. + +</P> + + +<H2><A NAME="SEC75" HREF="gettext_toc.html#TOC75">11.3 Invoking the <CODE>gettextize</CODE> Program</A></H2> + +<P> +Some files are consistently and identically needed in every package +internationalized through GNU <CODE>gettext</CODE>. As a matter of +convenience, the <CODE>gettextize</CODE> program puts all these files right +in your package. This program has the following synopsis: + +</P> + +<PRE> +gettextize [ <VAR>option</VAR>... ] [ <VAR>directory</VAR> ] +</PRE> + +<P> +and accepts the following options: + +</P> +<DL COMPACT> + +<DT><SAMP>`-c'</SAMP> +<DD> +<DT><SAMP>`--copy'</SAMP> +<DD> +Copy the needed files instead of making symbolic links. Using links +would allow the package to always use the latest <CODE>gettext</CODE> code +available on the system, but it might disturb some mechanism the +maintainer is used to apply to the sources. Because running +<CODE>gettextize</CODE> is easy there shouldn't be problems with using copies. + +<DT><SAMP>`-f'</SAMP> +<DD> +<DT><SAMP>`--force'</SAMP> +<DD> +Force replacement of files which already exist. + +<DT><SAMP>`-h'</SAMP> +<DD> +<DT><SAMP>`--help'</SAMP> +<DD> +Display this help and exit. + +<DT><SAMP>`--version'</SAMP> +<DD> +Output version information and exit. + +</DL> + +<P> +If <VAR>directory</VAR> is given, this is the top level directory of a +package to prepare for using GNU <CODE>gettext</CODE>. If not given, it +is assumed that the current directory is the top level directory of +such a package. + +</P> +<P> +The program <CODE>gettextize</CODE> provides the following files. However, +no existing file will be replaced unless the option <CODE>--force</CODE> +(<CODE>-f</CODE>) is specified. + +</P> + +<OL> +<LI> + +The <TT>`ABOUT-NLS'</TT> file is copied in the main directory of your package, +the one being at the top level. This file gives the main indications +about how to install and use the Native Language Support features +of your program. You might elect to use a more recent copy of this +<TT>`ABOUT-NLS'</TT> file than the one provided through <CODE>gettextize</CODE>, +if you have one handy. You may also fetch a more recent copy of file +<TT>`ABOUT-NLS'</TT> from Translation Project sites, and from most GNU +archive sites. + +<LI> + +A <TT>`po/'</TT> directory is created for eventually holding +all translation files, but initially only containing the file +<TT>`po/Makefile.in.in'</TT> from the GNU <CODE>gettext</CODE> distribution. +(beware the double <SAMP>`.in'</SAMP> in the file name). If the <TT>`po/'</TT> +directory already exists, it will be preserved along with the files +it contains, and only <TT>`Makefile.in.in'</TT> will be overwritten. + +<LI> + +A <TT>`intl/'</TT> directory is created and filled with most of the files +originally in the <TT>`intl/'</TT> directory of the GNU <CODE>gettext</CODE> +distribution. Also, if option <CODE>--force</CODE> (<CODE>-f</CODE>) is given, +the <TT>`intl/'</TT> directory is emptied first. + +</OL> + +<P> +If your site support symbolic links, <CODE>gettextize</CODE> will not +actually copy the files into your package, but establish symbolic +links instead. This avoids duplicating the disk space needed in +all packages. Merely using the <SAMP>`-h'</SAMP> option while creating the +<CODE>tar</CODE> archive of your distribution will resolve each link by an +actual copy in the distribution archive. So, to insist, you really +should use <SAMP>`-h'</SAMP> option with <CODE>tar</CODE> within your <CODE>dist</CODE> +goal of your main <TT>`Makefile.in'</TT>. + +</P> +<P> +It is interesting to understand that most new files for supporting +GNU <CODE>gettext</CODE> facilities in one package go in <TT>`intl/'</TT> +and <TT>`po/'</TT> subdirectories. One distinction between these two +directories is that <TT>`intl/'</TT> is meant to be completely identical +in all packages using GNU <CODE>gettext</CODE>, while all newly created +files, which have to be different, go into <TT>`po/'</TT>. There is a +common <TT>`Makefile.in.in'</TT> in <TT>`po/'</TT>, because the <TT>`po/'</TT> +directory needs its own <TT>`Makefile'</TT>, and it has been designed so +it can be identical in all packages. + +</P> + + +<H2><A NAME="SEC76" HREF="gettext_toc.html#TOC76">11.4 Files You Must Create or Alter</A></H2> + +<P> +Besides files which are automatically added through <CODE>gettextize</CODE>, +there are many files needing revision for properly interacting with +GNU <CODE>gettext</CODE>. If you are closely following GNU standards for +Makefile engineering and auto-configuration, the adaptations should +be easier to achieve. Here is a point by point description of the +changes needed in each. + +</P> +<P> +So, here comes a list of files, each one followed by a description of +all alterations it needs. Many examples are taken out from the GNU +<CODE>gettext</CODE> 0.10.37 distribution itself. You may indeed +refer to the source code of the GNU <CODE>gettext</CODE> package, as it +is intended to be a good example and master implementation for using +its own functionality. + +</P> + + + +<H3><A NAME="SEC77" HREF="gettext_toc.html#TOC77">11.4.1 <TT>`POTFILES.in'</TT> in <TT>`po/'</TT></A></H3> + +<P> +The <TT>`po/'</TT> directory should receive a file named +<TT>`POTFILES.in'</TT>. This file tells which files, among all program +sources, have marked strings needing translation. Here is an example +of such a file: + +</P> + +<PRE> +# List of source files containing translatable strings. +# Copyright (C) 1995 Free Software Foundation, Inc. + +# Common library files +lib/error.c +lib/getopt.c +lib/xmalloc.c + +# Package source files +src/gettext.c +src/msgfmt.c +src/xgettext.c +</PRE> + +<P> +Hash-marked comments and white lines are ignored. All other lines +list those source files containing strings marked for translation +(see section <A HREF="gettext_3.html#SEC15">3.2 How Marks Appear in Sources</A>), in a notation relative to the top level +of your whole distribution, rather than the location of the +<TT>`POTFILES.in'</TT> file itself. + +</P> + + +<H3><A NAME="SEC78" HREF="gettext_toc.html#TOC78">11.4.2 <TT>`configure.in'</TT> at top level</A></H3> + + +<OL> +<LI>Declare the package and version. + +This is done by a set of lines like these: + + +<PRE> +PACKAGE=gettext +VERSION=0.10.37 +AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE") +AC_DEFINE_UNQUOTED(VERSION, "$VERSION") +AC_SUBST(PACKAGE) +AC_SUBST(VERSION) +</PRE> + +Of course, you replace <SAMP>`gettext'</SAMP> with the name of your package, +and <SAMP>`0.10.37'</SAMP> by its version numbers, exactly as they +should appear in the packaged <CODE>tar</CODE> file name of your distribution +(<TT>`gettext-0.10.37.tar.gz'</TT>, here). + +<LI>Declare the available translations. + +This is done by defining <CODE>ALL_LINGUAS</CODE> to the white separated, +quoted list of available languages, in a single line, like this: + + +<PRE> +ALL_LINGUAS="de fr" +</PRE> + +This example means that German and French PO files are available, so +that these languages are currently supported by your package. If you +want to further restrict, at installation time, the set of installed +languages, this should not be done by modifying <CODE>ALL_LINGUAS</CODE> in +<TT>`configure.in'</TT>, but rather by using the <CODE>LINGUAS</CODE> environment +variable (see section <A HREF="gettext_8.html#SEC39">8.2 Magic for Installers</A>). + +<LI>Check for internationalization support. + +Here is the main <CODE>m4</CODE> macro for triggering internationalization +support. Just add this line to <TT>`configure.in'</TT>: + + +<PRE> +AM_GNU_GETTEXT +</PRE> + +This call is purposely simple, even if it generates a lot of configure +time checking and actions. + +<LI>Have output files created. + +The <CODE>AC_OUTPUT</CODE> directive, at the end of your <TT>`configure.in'</TT> +file, needs to be modified in two ways: + + +<PRE> +AC_OUTPUT([<VAR>existing configuration files</VAR> intl/Makefile po/Makefile.in], +<VAR>existing additional actions</VAR>]) +</PRE> + +The modification to the first argument to <CODE>AC_OUTPUT</CODE> asks +for substitution in the <TT>`intl/'</TT> and <TT>`po/'</TT> directories. +Note the <SAMP>`.in'</SAMP> suffix used for <TT>`po/'</TT> only. This is because +the distributed file is really <TT>`po/Makefile.in.in'</TT>. + +</OL> + + + +<H3><A NAME="SEC79" HREF="gettext_toc.html#TOC79">11.4.3 <TT>`config.guess'</TT>, <TT>`config.sub'</TT> at top level</A></H3> + +<P> +You need to add the GNU <TT>`config.guess'</TT> and <TT>`config.sub'</TT> files +to your distribution. They are needed because the <TT>`intl/'</TT> directory +has platform dependent support for determining the locale's character +encoding and therefore needs to identify the platform. + +</P> +<P> +You can obtain the newest version of <TT>`config.guess'</TT> and +<TT>`config.sub'</TT> from <TT>`ftp://ftp.gnu.org/pub/gnu/config/'</TT>. +Less recent versions are also contained in the GNU <CODE>automake</CODE> and +GNU <CODE>libtool</CODE> packages. + +</P> +<P> +Normally, <TT>`config.guess'</TT> and <TT>`config.sub'</TT> are put at the +top level of a distribution. But it is also possible to put them in a +subdirectory, altogether with other configuration support files like +<TT>`install-sh'</TT>, <TT>`ltconfig'</TT>, <TT>`ltmain.sh'</TT>, +<TT>`mkinstalldirs'</TT> or <TT>`missing'</TT>. All you need to do, other than +moving the files, is to add the following line to your +<TT>`configure.in'</TT>. + +</P> + +<PRE> +AC_CONFIG_AUX_DIR([<VAR>subdir</VAR>]) +</PRE> + + + +<H3><A NAME="SEC80" HREF="gettext_toc.html#TOC80">11.4.4 <TT>`aclocal.m4'</TT> at top level</A></H3> + +<P> +If you do not have an <TT>`aclocal.m4'</TT> file in your distribution, +the simplest is to concatenate the files <TT>`codeset.m4'</TT>, +<TT>`gettext.m4'</TT>, <TT>`iconv.m4'</TT>, <TT>`isc-posix.m4'</TT>, +<TT>`lcmessage.m4'</TT>, <TT>`progtest.m4'</TT> from GNU <CODE>gettext</CODE>'s +<TT>`m4/'</TT> directory into a single file. + +</P> +<P> +If you already have an <TT>`aclocal.m4'</TT> file, then you will have +to merge the said macro files into your <TT>`aclocal.m4'</TT>. Note that if +you are upgrading from a previous release of GNU <CODE>gettext</CODE>, you +should most probably <EM>replace</EM> the macros (<CODE>AM_GNU_GETTEXT</CODE>, +<CODE>AM_WITH_NLS</CODE>, etc.), as they usually +change a little from one release of GNU <CODE>gettext</CODE> to the next. +Their contents may vary as we get more experience with strange systems +out there. + +</P> +<P> +These macros check for the internationalization support functions +and related informations. Hopefully, once stabilized, these macros +might be integrated in the standard Autoconf set, because this +piece of <CODE>m4</CODE> code will be the same for all projects using GNU +<CODE>gettext</CODE>. + +</P> + + +<H3><A NAME="SEC81" HREF="gettext_toc.html#TOC81">11.4.5 <TT>`acconfig.h'</TT> at top level</A></H3> + +<P> +Earlier GNU <CODE>gettext</CODE> releases required to put definitions for +<CODE>ENABLE_NLS</CODE>, <CODE>HAVE_GETTEXT</CODE> and <CODE>HAVE_LC_MESSAGES</CODE>, +<CODE>HAVE_STPCPY</CODE>, <CODE>PACKAGE</CODE> and <CODE>VERSION</CODE> into an +<TT>`acconfig.h'</TT> file. This is not needed any more; you can remove +them from your <TT>`acconfig.h'</TT> file unless your package uses them +independently from the <TT>`intl/'</TT> directory. + +</P> + + +<H3><A NAME="SEC82" HREF="gettext_toc.html#TOC82">11.4.6 <TT>`Makefile.in'</TT> at top level</A></H3> + +<P> +Here are a few modifications you need to make to your main, top-level +<TT>`Makefile.in'</TT> file. + +</P> + +<OL> +<LI> + +Add the following lines near the beginning of your <TT>`Makefile.in'</TT>, +so the <SAMP>`dist:'</SAMP> goal will work properly (as explained further down): + + +<PRE> +PACKAGE = @PACKAGE@ +VERSION = @VERSION@ +</PRE> + +<LI> + +Add file <TT>`ABOUT-NLS'</TT> to the <CODE>DISTFILES</CODE> definition, so the file gets +distributed. + +<LI> + +Wherever you process subdirectories in your <TT>`Makefile.in'</TT>, be sure +you also process dir subdirectories <SAMP>`intl'</SAMP> and <SAMP>`po'</SAMP>. Special +rules in the <TT>`Makefiles'</TT> take care for the case where no +internationalization is wanted. + +If you are using Makefiles, either generated by automake, or hand-written +so they carefully follow the GNU coding standards, the effected goals for +which the new subdirectories must be handled include <SAMP>`installdirs'</SAMP>, +<SAMP>`install'</SAMP>, <SAMP>`uninstall'</SAMP>, <SAMP>`clean'</SAMP>, <SAMP>`distclean'</SAMP>. + +Here is an example of a canonical order of processing. In this +example, we also define <CODE>SUBDIRS</CODE> in <CODE>Makefile.in</CODE> for it +to be further used in the <SAMP>`dist:'</SAMP> goal. + + +<PRE> +SUBDIRS = doc intl lib src @POSUB@ +</PRE> + +Note that you must arrange for <SAMP>`make'</SAMP> to descend into the +<CODE>intl</CODE> directory before descending into other directories containing +code which make use of the <CODE>libintl.h</CODE> header file. For this +reason, here we mention <CODE>intl</CODE> before <CODE>lib</CODE> and <CODE>src</CODE>. + +that you will have to adapt to your own package. + +<LI> + +A delicate point is the <SAMP>`dist:'</SAMP> goal, as both +<TT>`intl/Makefile'</TT> and <TT>`po/Makefile'</TT> will later assume that the +proper directory has been set up from the main <TT>`Makefile'</TT>. Here is +an example at what the <SAMP>`dist:'</SAMP> goal might look like: + + +<PRE> +distdir = $(PACKAGE)-$(VERSION) +dist: Makefile + rm -fr $(distdir) + mkdir $(distdir) + chmod 777 $(distdir) + for file in $(DISTFILES); do \ + ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \ + done + for subdir in $(SUBDIRS); do \ + mkdir $(distdir)/$$subdir || exit 1; \ + chmod 777 $(distdir)/$$subdir; \ + (cd $$subdir && $(MAKE) $@) || exit 1; \ + done + tar chozf $(distdir).tar.gz $(distdir) + rm -fr $(distdir) +</PRE> + +</OL> + + + +<H3><A NAME="SEC83" HREF="gettext_toc.html#TOC83">11.4.7 <TT>`Makefile.in'</TT> in <TT>`src/'</TT></A></H3> + +<P> +Some of the modifications made in the main <TT>`Makefile.in'</TT> will +also be needed in the <TT>`Makefile.in'</TT> from your package sources, +which we assume here to be in the <TT>`src/'</TT> subdirectory. Here are +all the modifications needed in <TT>`src/Makefile.in'</TT>: + +</P> + +<OL> +<LI> + +In view of the <SAMP>`dist:'</SAMP> goal, you should have these lines near the +beginning of <TT>`src/Makefile.in'</TT>: + + +<PRE> +PACKAGE = @PACKAGE@ +VERSION = @VERSION@ +</PRE> + +<LI> + +If not done already, you should guarantee that <CODE>top_srcdir</CODE> +gets defined. This will serve for <CODE>cpp</CODE> include files. Just add +the line: + + +<PRE> +top_srcdir = @top_srcdir@ +</PRE> + +<LI> + +You might also want to define <CODE>subdir</CODE> as <SAMP>`src'</SAMP>, later +allowing for almost uniform <SAMP>`dist:'</SAMP> goals in all your +<TT>`Makefile.in'</TT>. At list, the <SAMP>`dist:'</SAMP> goal below assume that +you used: + + +<PRE> +subdir = src +</PRE> + +<LI> + +The <CODE>main</CODE> function of your program will normally call +<CODE>bindtextdomain</CODE> (see see section <A HREF="gettext_3.html#SEC14">3.1 Triggering <CODE>gettext</CODE> Operations</A>), like this: + + +<PRE> +bindtextdomain (<VAR>PACKAGE</VAR>, LOCALEDIR); +</PRE> + +To make LOCALEDIR known to the program, add the following lines to +Makefile.in: + + +<PRE> +datadir = @datadir@ +localedir = $(datadir)/locale +DEFS = -DLOCALEDIR=\"$(localedir)\" @DEFS@ +</PRE> + +Note that <CODE>@datadir@</CODE> defaults to <SAMP>`$(prefix)/share'</SAMP>, thus +<CODE>$(localedir)</CODE> defaults to <SAMP>`$(prefix)/share/locale'</SAMP>. + +<LI> + +You should ensure that the final linking will use <CODE>@INTLLIBS@</CODE> as +a library. An easy way to achieve this is to manage that it gets into +<CODE>LIBS</CODE>, like this: + + +<PRE> +LIBS = @INTLLIBS@ @LIBS@ +</PRE> + +In most packages internationalized with GNU <CODE>gettext</CODE>, one will +find a directory <TT>`lib/'</TT> in which a library containing some helper +functions will be build. (You need at least the few functions which the +GNU <CODE>gettext</CODE> Library itself needs.) However some of the functions +in the <TT>`lib/'</TT> also give messages to the user which of course should be +translated, too. Taking care of this it is not enough to place the support +library (say <TT>`libsupport.a'</TT>) just between the <CODE>@INTLLIBS@</CODE> +and <CODE>@LIBS@</CODE> in the above example. Instead one has to write this: + + +<PRE> +LIBS = ../lib/libsupport.a @INTLLIBS@ ../lib/libsupport.a @LIBS@ +</PRE> + +<LI> + +You should also ensure that directory <TT>`intl/'</TT> will be searched for +C preprocessor include files in all circumstances. So, you have to +manage so both <SAMP>`-I../intl'</SAMP> and <SAMP>`-I$(top_srcdir)/intl'</SAMP> will +be given to the C compiler. + +<LI> + +Your <SAMP>`dist:'</SAMP> goal has to conform with others. Here is a +reasonable definition for it: + + +<PRE> +distdir = ../$(PACKAGE)-$(VERSION)/$(subdir) +dist: Makefile $(DISTFILES) + for file in $(DISTFILES); do \ + ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \ + done +</PRE> + +</OL> + +<P><HR><P> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_10.html">previous</A>, <A HREF="gettext_12.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +</BODY> +</HTML> diff --git a/doc/gettext_12.html b/doc/gettext_12.html new file mode 100644 index 0000000..e069839 --- /dev/null +++ b/doc/gettext_12.html @@ -0,0 +1,160 @@ +<HTML> +<HEAD> +<!-- This HTML file has been created by texi2html 1.51 + from gettext.texi on 19 April 2001 --> + +<TITLE>GNU gettext utilities - 12 Concluding Remarks</TITLE> +</HEAD> +<BODY> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_11.html">previous</A>, <A HREF="gettext_13.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +<P><HR><P> + + +<H1><A NAME="SEC84" HREF="gettext_toc.html#TOC84">12 Concluding Remarks</A></H1> + +<P> +We would like to conclude this GNU <CODE>gettext</CODE> manual by presenting +an history of the Translation Project so far. We finally give +a few pointers for those who want to do further research or readings +about Native Language Support matters. + +</P> + + + +<H2><A NAME="SEC85" HREF="gettext_toc.html#TOC85">12.1 History of GNU <CODE>gettext</CODE></A></H2> + +<P> +Internationalization concerns and algorithms have been informally +and casually discussed for years in GNU, sometimes around GNU +<CODE>libc</CODE>, maybe around the incoming <CODE>Hurd</CODE>, or otherwise +(nobody clearly remembers). And even then, when the work started for +real, this was somewhat independently of these previous discussions. + +</P> +<P> +This all began in July 1994, when Patrick D'Cruze had the idea and +initiative of internationalizing version 3.9.2 of GNU <CODE>fileutils</CODE>. +He then asked Jim Meyering, the maintainer, how to get those changes +folded into an official release. That first draft was full of +<CODE>#ifdef</CODE>s and somewhat disconcerting, and Jim wanted to find +nicer ways. Patrick and Jim shared some tries and experimentations +in this area. Then, feeling that this might eventually have a deeper +impact on GNU, Jim wanted to know what standards were, and contacted +Richard Stallman, who very quickly and verbally described an overall +design for what was meant to become <CODE>glocale</CODE>, at that time. + +</P> +<P> +Jim implemented <CODE>glocale</CODE> and got a lot of exhausting feedback +from Patrick and Richard, of course, but also from Mitchum DSouza +(who wrote a <CODE>catgets</CODE>-like package), Roland McGrath, maybe David +MacKenzie, Fran@,{c}ois Pinard, and Paul Eggert, all pushing and +pulling in various directions, not always compatible, to the extent +that after a couple of test releases, <CODE>glocale</CODE> was torn apart. + +</P> +<P> +While Jim took some distance and time and became dad for a second +time, Roland wanted to get GNU <CODE>libc</CODE> internationalized, and +got Ulrich Drepper involved in that project. Instead of starting +from <CODE>glocale</CODE>, Ulrich rewrote something from scratch, but +more conformant to the set of guidelines who emerged out of the +<CODE>glocale</CODE> effort. Then, Ulrich got people from the previous +forum to involve themselves into this new project, and the switch +from <CODE>glocale</CODE> to what was first named <CODE>msgutils</CODE>, renamed +<CODE>nlsutils</CODE>, and later <CODE>gettext</CODE>, became officially accepted +by Richard in May 1995 or so. + +</P> +<P> +Let's summarize by saying that Ulrich Drepper wrote GNU <CODE>gettext</CODE> +in April 1995. The first official release of the package, including +PO mode, occurred in July 1995, and was numbered 0.7. Other people +contributed to the effort by providing a discussion forum around +Ulrich, writing little pieces of code, or testing. These are quoted +in the <CODE>THANKS</CODE> file which comes with the GNU <CODE>gettext</CODE> +distribution. + +</P> +<P> +While this was being done, Fran@,{c}ois adapted half a dozen of +GNU packages to <CODE>glocale</CODE> first, then later to <CODE>gettext</CODE>, +putting them in pretest, so providing along the way an effective +user environment for fine tuning the evolving tools. He also took +the responsibility of organizing and coordinating the Translation +Project. After nearly a year of informal exchanges between people from +many countries, translator teams started to exist in May 1995, through +the creation and support by Patrick D'Cruze of twenty unmoderated +mailing lists for that many native languages, and two moderated +lists: one for reaching all teams at once, the other for reaching +all willing maintainers of internationalized free software packages. + +</P> +<P> +Fran@,{c}ois also wrote PO mode in June 1995 with the collaboration +of Greg McGary, as a kind of contribution to Ulrich's package. +He also gave a hand with the GNU <CODE>gettext</CODE> Texinfo manual. + +</P> + + +<H2><A NAME="SEC86" HREF="gettext_toc.html#TOC86">12.2 Related Readings</A></H2> + +<P> +Eugene H. Dorr (<TT>`dorre@well.com'</TT>) maintains an interesting +bibliography on internationalization matters, called +<CITE>Internationalization Reference List</CITE>, which is available as: + +<PRE> +ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/i18n-books.txt +</PRE> + +<P> +Michael Gschwind (<TT>`mike@vlsivie.tuwien.ac.at'</TT>) maintains a +Frequently Asked Questions (FAQ) list, entitled <CITE>Programming for +Internationalisation</CITE>. This FAQ discusses writing programs which +can handle different language conventions, character sets, etc.; +and is applicable to all character set encodings, with particular +emphasis on ISO 8859-1. It is regularly published in Usenet +groups <TT>`comp.unix.questions'</TT>, <TT>`comp.std.internat'</TT>, +<TT>`comp.software.international'</TT>, <TT>`comp.lang.c'</TT>, +<TT>`comp.windows.x'</TT>, <TT>`comp.std.c'</TT>, <TT>`comp.answers'</TT> +and <TT>`news.answers'</TT>. The home location of this document is: + +<PRE> +ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/ISO-programming +</PRE> + +<P> +Patrick D'Cruze (<TT>`pdcruze@li.org'</TT>) wrote a tutorial about NLS +matters, and Jochen Hein (<TT>`Hein@student.tu-clausthal.de'</TT>) took +over the responsibility of maintaining it. It may be found as: + +<PRE> +ftp://sunsite.unc.edu/pub/Linux/utils/nls/catalogs/Incoming/... + ...locale-tutorial-0.8.txt.gz +</PRE> + +<P> +This site is mirrored in: + +<PRE> +ftp://ftp.ibp.fr/pub/linux/sunsite/ +</PRE> + +<P> +A French version of the same tutorial should be findable at: + +<PRE> +ftp://ftp.ibp.fr/pub/linux/french/docs/ +</PRE> + +<P> +together with French translations of many Linux-related documents. + +</P> +<P><HR><P> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_11.html">previous</A>, <A HREF="gettext_13.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +</BODY> +</HTML> diff --git a/doc/gettext_13.html b/doc/gettext_13.html new file mode 100644 index 0000000..ee20244 --- /dev/null +++ b/doc/gettext_13.html @@ -0,0 +1,523 @@ +<HTML> +<HEAD> +<!-- This HTML file has been created by texi2html 1.51 + from gettext.texi on 19 April 2001 --> + +<TITLE>GNU gettext utilities - A Language Codes</TITLE> +</HEAD> +<BODY> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_12.html">previous</A>, <A HREF="gettext_14.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +<P><HR><P> + + +<H1><A NAME="SEC87" HREF="gettext_toc.html#TOC87">A Language Codes</A></H1> + +<P> +The ISO 639 standard defines two character codes for many languages. +All abbreviations for languages used in the Translation Project should +come from this standard. + +</P> +<DL COMPACT> + +<DT><SAMP>`aa'</SAMP> +<DD> +Afar. +<DT><SAMP>`ab'</SAMP> +<DD> +Abkhazian. +<DT><SAMP>`ae'</SAMP> +<DD> +Avestan. +<DT><SAMP>`af'</SAMP> +<DD> +Afrikaans. +<DT><SAMP>`am'</SAMP> +<DD> +Amharic. +<DT><SAMP>`ar'</SAMP> +<DD> +Arabic. +<DT><SAMP>`as'</SAMP> +<DD> +Assamese. +<DT><SAMP>`ay'</SAMP> +<DD> +Aymara. +<DT><SAMP>`az'</SAMP> +<DD> +Azerbaijani. +<DT><SAMP>`ba'</SAMP> +<DD> +Bashkir. +<DT><SAMP>`be'</SAMP> +<DD> +Byelorussian; Belarusian. +<DT><SAMP>`bg'</SAMP> +<DD> +Bulgarian. +<DT><SAMP>`bh'</SAMP> +<DD> +Bihari. +<DT><SAMP>`bi'</SAMP> +<DD> +Bislama. +<DT><SAMP>`bn'</SAMP> +<DD> +Bengali; Bangla. +<DT><SAMP>`bo'</SAMP> +<DD> +Tibetan. +<DT><SAMP>`br'</SAMP> +<DD> +Breton. +<DT><SAMP>`bs'</SAMP> +<DD> +Bosnian. +<DT><SAMP>`ca'</SAMP> +<DD> +Catalan. +<DT><SAMP>`ce'</SAMP> +<DD> +Chechen. +<DT><SAMP>`ch'</SAMP> +<DD> +Chamorro. +<DT><SAMP>`co'</SAMP> +<DD> +Corsican. +<DT><SAMP>`cs'</SAMP> +<DD> +Czech. +<DT><SAMP>`cu'</SAMP> +<DD> +Church Slavic. +<DT><SAMP>`cv'</SAMP> +<DD> +Chuvash. +<DT><SAMP>`cy'</SAMP> +<DD> +Welsh. +<DT><SAMP>`da'</SAMP> +<DD> +Danish. +<DT><SAMP>`de'</SAMP> +<DD> +German. +<DT><SAMP>`dz'</SAMP> +<DD> +Dzongkha; Bhutani. +<DT><SAMP>`el'</SAMP> +<DD> +Greek. +<DT><SAMP>`en'</SAMP> +<DD> +English. +<DT><SAMP>`eo'</SAMP> +<DD> +Esperanto. +<DT><SAMP>`es'</SAMP> +<DD> +Spanish. +<DT><SAMP>`et'</SAMP> +<DD> +Estonian. +<DT><SAMP>`eu'</SAMP> +<DD> +Basque. +<DT><SAMP>`fa'</SAMP> +<DD> +Persian. +<DT><SAMP>`fi'</SAMP> +<DD> +Finnish. +<DT><SAMP>`fj'</SAMP> +<DD> +Fijian; Fiji. +<DT><SAMP>`fo'</SAMP> +<DD> +Faroese. +<DT><SAMP>`fr'</SAMP> +<DD> +French. +<DT><SAMP>`fy'</SAMP> +<DD> +Frisian. +<DT><SAMP>`ga'</SAMP> +<DD> +Irish. +<DT><SAMP>`gd'</SAMP> +<DD> +Scots; Gaelic. +<DT><SAMP>`gl'</SAMP> +<DD> +Gallegan; Galician. +<DT><SAMP>`gn'</SAMP> +<DD> +Guarani. +<DT><SAMP>`gu'</SAMP> +<DD> +Gujarati. +<DT><SAMP>`gv'</SAMP> +<DD> +Manx. +<DT><SAMP>`ha'</SAMP> +<DD> +Hausa (?). +<DT><SAMP>`he'</SAMP> +<DD> +Hebrew (formerly iw). +<DT><SAMP>`hi'</SAMP> +<DD> +Hindi. +<DT><SAMP>`ho'</SAMP> +<DD> +Hiri Motu. +<DT><SAMP>`hr'</SAMP> +<DD> +Croatian. +<DT><SAMP>`hu'</SAMP> +<DD> +Hungarian. +<DT><SAMP>`hy'</SAMP> +<DD> +Armenian. +<DT><SAMP>`hz'</SAMP> +<DD> +Herero. +<DT><SAMP>`ia'</SAMP> +<DD> +Interlingua. +<DT><SAMP>`id'</SAMP> +<DD> +Indonesian (formerly in). +<DT><SAMP>`ie'</SAMP> +<DD> +Interlingue. +<DT><SAMP>`ik'</SAMP> +<DD> +Inupiak. +<DT><SAMP>`is'</SAMP> +<DD> +Icelandic. +<DT><SAMP>`it'</SAMP> +<DD> +Italian. +<DT><SAMP>`iu'</SAMP> +<DD> +Inuktitut. +<DT><SAMP>`ja'</SAMP> +<DD> +Japanese. +<DT><SAMP>`jw'</SAMP> +<DD> +Javanese. +<DT><SAMP>`ka'</SAMP> +<DD> +Georgian. +<DT><SAMP>`ki'</SAMP> +<DD> +Kikuyu. +<DT><SAMP>`kj'</SAMP> +<DD> +Kuanyama. +<DT><SAMP>`kk'</SAMP> +<DD> +Kazakh. +<DT><SAMP>`kl'</SAMP> +<DD> +Kalaallisut; Greenlandic. +<DT><SAMP>`km'</SAMP> +<DD> +Khmer; Cambodian. +<DT><SAMP>`kn'</SAMP> +<DD> +Kannada. +<DT><SAMP>`ko'</SAMP> +<DD> +Korean. +<DT><SAMP>`ks'</SAMP> +<DD> +Kashmiri. +<DT><SAMP>`ku'</SAMP> +<DD> +Kurdish. +<DT><SAMP>`kv'</SAMP> +<DD> +Komi. +<DT><SAMP>`kw'</SAMP> +<DD> +Cornish. +<DT><SAMP>`ky'</SAMP> +<DD> +Kirghiz. +<DT><SAMP>`la'</SAMP> +<DD> +Latin. +<DT><SAMP>`lb'</SAMP> +<DD> +Letzeburgesch. +<DT><SAMP>`ln'</SAMP> +<DD> +Lingala. +<DT><SAMP>`lo'</SAMP> +<DD> +Lao; Laotian. +<DT><SAMP>`lt'</SAMP> +<DD> +Lithuanian. +<DT><SAMP>`lv'</SAMP> +<DD> +Latvian; Lettish. +<DT><SAMP>`mg'</SAMP> +<DD> +Malagasy. +<DT><SAMP>`mh'</SAMP> +<DD> +Marshall. +<DT><SAMP>`mi'</SAMP> +<DD> +Maori. +<DT><SAMP>`mk'</SAMP> +<DD> +Macedonian. +<DT><SAMP>`ml'</SAMP> +<DD> +Malayalam. +<DT><SAMP>`mn'</SAMP> +<DD> +Mongolian. +<DT><SAMP>`mo'</SAMP> +<DD> +Moldavian. +<DT><SAMP>`mr'</SAMP> +<DD> +Marathi. +<DT><SAMP>`ms'</SAMP> +<DD> +Malay. +<DT><SAMP>`mt'</SAMP> +<DD> +Maltese. +<DT><SAMP>`my'</SAMP> +<DD> +Burmese. +<DT><SAMP>`na'</SAMP> +<DD> +Nauru. +<DT><SAMP>`nb'</SAMP> +<DD> +Norwegian Bokm@aa{}l. +<DT><SAMP>`nd'</SAMP> +<DD> +Ndebele, North. +<DT><SAMP>`ne'</SAMP> +<DD> +Nepali. +<DT><SAMP>`ng'</SAMP> +<DD> +Ndonga. +<DT><SAMP>`nl'</SAMP> +<DD> +Dutch. +<DT><SAMP>`nn'</SAMP> +<DD> +Norwegian Nynorsk. +<DT><SAMP>`no'</SAMP> +<DD> +Norwegian. +<DT><SAMP>`nr'</SAMP> +<DD> +Ndebele, South. +<DT><SAMP>`nv'</SAMP> +<DD> +Navajo. +<DT><SAMP>`ny'</SAMP> +<DD> +Chichewa; Nyanja. +<DT><SAMP>`oc'</SAMP> +<DD> +Occitan; Proven@,{c}al. +<DT><SAMP>`om'</SAMP> +<DD> +(Afan) Oromo. +<DT><SAMP>`or'</SAMP> +<DD> +Oriya. +<DT><SAMP>`os'</SAMP> +<DD> +Ossetian; Ossetic. +<DT><SAMP>`pa'</SAMP> +<DD> +Panjabi; Punjabi. +<DT><SAMP>`pi'</SAMP> +<DD> +Pali. +<DT><SAMP>`pl'</SAMP> +<DD> +Polish. +<DT><SAMP>`ps'</SAMP> +<DD> +Pashto, Pushto. +<DT><SAMP>`pt'</SAMP> +<DD> +Portuguese. +<DT><SAMP>`qu'</SAMP> +<DD> +Quechua. +<DT><SAMP>`rm'</SAMP> +<DD> +Rhaeto-Romance. +<DT><SAMP>`rn'</SAMP> +<DD> +Rundi; Kirundi. +<DT><SAMP>`ro'</SAMP> +<DD> +Romanian. +<DT><SAMP>`ru'</SAMP> +<DD> +Russian. +<DT><SAMP>`rw'</SAMP> +<DD> +Kinyarwanda. +<DT><SAMP>`sa'</SAMP> +<DD> +Sanskrit. +<DT><SAMP>`sc'</SAMP> +<DD> +Sardinian. +<DT><SAMP>`sd'</SAMP> +<DD> +Sindhi. +<DT><SAMP>`se'</SAMP> +<DD> +Northern Sami. +<DT><SAMP>`sg'</SAMP> +<DD> +Sango; Sangro. +<DT><SAMP>`si'</SAMP> +<DD> +Sinhalese. +<DT><SAMP>`sk'</SAMP> +<DD> +Slovak. +<DT><SAMP>`sl'</SAMP> +<DD> +Slovenian. +<DT><SAMP>`sm'</SAMP> +<DD> +Samoan. +<DT><SAMP>`sn'</SAMP> +<DD> +Shona. +<DT><SAMP>`so'</SAMP> +<DD> +Somali. +<DT><SAMP>`sq'</SAMP> +<DD> +Albanian. +<DT><SAMP>`sr'</SAMP> +<DD> +Serbian. +<DT><SAMP>`ss'</SAMP> +<DD> +Swati; Siswati. +<DT><SAMP>`st'</SAMP> +<DD> +Sesotho; Sotho, Southern. +<DT><SAMP>`su'</SAMP> +<DD> +Sundanese. +<DT><SAMP>`sv'</SAMP> +<DD> +Swedish. +<DT><SAMP>`sw'</SAMP> +<DD> +Swahili. +<DT><SAMP>`ta'</SAMP> +<DD> +Tamil. +<DT><SAMP>`te'</SAMP> +<DD> +Telugu. +<DT><SAMP>`tg'</SAMP> +<DD> +Tajik. +<DT><SAMP>`th'</SAMP> +<DD> +Thai. +<DT><SAMP>`ti'</SAMP> +<DD> +Tigrinya. +<DT><SAMP>`tk'</SAMP> +<DD> +Turkmen. +<DT><SAMP>`tl'</SAMP> +<DD> +Tagalog. +<DT><SAMP>`tn'</SAMP> +<DD> +Tswana; Setswana. +<DT><SAMP>`to'</SAMP> +<DD> +Tonga (?). +<DT><SAMP>`tr'</SAMP> +<DD> +Turkish. +<DT><SAMP>`ts'</SAMP> +<DD> +Tsonga. +<DT><SAMP>`tt'</SAMP> +<DD> +Tatar. +<DT><SAMP>`tw'</SAMP> +<DD> +Twi. +<DT><SAMP>`ty'</SAMP> +<DD> +Tahitian. +<DT><SAMP>`ug'</SAMP> +<DD> +Uighur. +<DT><SAMP>`uk'</SAMP> +<DD> +Ukrainian. +<DT><SAMP>`ur'</SAMP> +<DD> +Urdu. +<DT><SAMP>`uz'</SAMP> +<DD> +Uzbek. +<DT><SAMP>`vi'</SAMP> +<DD> +Vietnamese. +<DT><SAMP>`vo'</SAMP> +<DD> +Volap@"{u}k; Volapuk. +<DT><SAMP>`wo'</SAMP> +<DD> +Wolof. +<DT><SAMP>`xh'</SAMP> +<DD> +Xhosa. +<DT><SAMP>`yi'</SAMP> +<DD> +Yiddish (formerly ji). +<DT><SAMP>`yo'</SAMP> +<DD> +Yoruba. +<DT><SAMP>`za'</SAMP> +<DD> +Zhuang. +<DT><SAMP>`zh'</SAMP> +<DD> +Chinese. +<DT><SAMP>`zu'</SAMP> +<DD> +Zulu. +</DL> + +<P><HR><P> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_12.html">previous</A>, <A HREF="gettext_14.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +</BODY> +</HTML> diff --git a/doc/gettext_14.html b/doc/gettext_14.html new file mode 100644 index 0000000..99ad0ca --- /dev/null +++ b/doc/gettext_14.html @@ -0,0 +1,745 @@ +<HTML> +<HEAD> +<!-- This HTML file has been created by texi2html 1.51 + from gettext.texi on 19 April 2001 --> + +<TITLE>GNU gettext utilities - B Country Codes</TITLE> +</HEAD> +<BODY> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_13.html">previous</A>, next, last section, <A HREF="gettext_toc.html">table of contents</A>. +<P><HR><P> + + +<H1><A NAME="SEC88" HREF="gettext_toc.html#TOC88">B Country Codes</A></H1> + +<P> +The ISO 3166 standard defines two character codes for many countries +and territories. All abbreviations for countries used in the Translation +Project should come from this standard. + +</P> +<DL COMPACT> + +<DT><SAMP>`AD'</SAMP> +<DD> +Andorra. +<DT><SAMP>`AE'</SAMP> +<DD> +United Arab Emirates. +<DT><SAMP>`AF'</SAMP> +<DD> +Afghanistan. +<DT><SAMP>`AG'</SAMP> +<DD> +Antigua and Barbuda. +<DT><SAMP>`AI'</SAMP> +<DD> +Anguilla. +<DT><SAMP>`AL'</SAMP> +<DD> +Albania. +<DT><SAMP>`AM'</SAMP> +<DD> +Armenia. +<DT><SAMP>`AN'</SAMP> +<DD> +Netherlands Antilles. +<DT><SAMP>`AO'</SAMP> +<DD> +Angola. +<DT><SAMP>`AQ'</SAMP> +<DD> +Antarctica. +<DT><SAMP>`AR'</SAMP> +<DD> +Argentina. +<DT><SAMP>`AS'</SAMP> +<DD> +Samoa (American). +<DT><SAMP>`AT'</SAMP> +<DD> +Austria. +<DT><SAMP>`AU'</SAMP> +<DD> +Australia. +<DT><SAMP>`AW'</SAMP> +<DD> +Aruba. +<DT><SAMP>`AZ'</SAMP> +<DD> +Azerbaijan. +<DT><SAMP>`BA'</SAMP> +<DD> +Bosnia and Herzegovina. +<DT><SAMP>`BB'</SAMP> +<DD> +Barbados. +<DT><SAMP>`BD'</SAMP> +<DD> +Bangladesh. +<DT><SAMP>`BE'</SAMP> +<DD> +Belgium. +<DT><SAMP>`BF'</SAMP> +<DD> +Burkina Faso. +<DT><SAMP>`BG'</SAMP> +<DD> +Bulgaria. +<DT><SAMP>`BH'</SAMP> +<DD> +Bahrain. +<DT><SAMP>`BI'</SAMP> +<DD> +Burundi. +<DT><SAMP>`BJ'</SAMP> +<DD> +Benin. +<DT><SAMP>`BM'</SAMP> +<DD> +Bermuda. +<DT><SAMP>`BN'</SAMP> +<DD> +Brunei. +<DT><SAMP>`BO'</SAMP> +<DD> +Bolivia. +<DT><SAMP>`BR'</SAMP> +<DD> +Brazil. +<DT><SAMP>`BS'</SAMP> +<DD> +Bahamas. +<DT><SAMP>`BT'</SAMP> +<DD> +Bhutan. +<DT><SAMP>`BV'</SAMP> +<DD> +Bouvet Island. +<DT><SAMP>`BW'</SAMP> +<DD> +Botswana. +<DT><SAMP>`BY'</SAMP> +<DD> +Belarus. +<DT><SAMP>`BZ'</SAMP> +<DD> +Belize. +<DT><SAMP>`CA'</SAMP> +<DD> +Canada. +<DT><SAMP>`CC'</SAMP> +<DD> +Cocos (Keeling) Islands. +<DT><SAMP>`CD'</SAMP> +<DD> +Congo (Dem. Rep.). +<DT><SAMP>`CF'</SAMP> +<DD> +Central African Rep.. +<DT><SAMP>`CG'</SAMP> +<DD> +Congo (Rep.). +<DT><SAMP>`CH'</SAMP> +<DD> +Switzerland. +<DT><SAMP>`CI'</SAMP> +<DD> +Cote d'Ivoire. +<DT><SAMP>`CK'</SAMP> +<DD> +Cook Islands. +<DT><SAMP>`CL'</SAMP> +<DD> +Chile. +<DT><SAMP>`CM'</SAMP> +<DD> +Cameroon. +<DT><SAMP>`CN'</SAMP> +<DD> +China. +<DT><SAMP>`CO'</SAMP> +<DD> +Colombia. +<DT><SAMP>`CR'</SAMP> +<DD> +Costa Rica. +<DT><SAMP>`CU'</SAMP> +<DD> +Cuba. +<DT><SAMP>`CV'</SAMP> +<DD> +Cape Verde. +<DT><SAMP>`CX'</SAMP> +<DD> +Christmas Island. +<DT><SAMP>`CY'</SAMP> +<DD> +Cyprus. +<DT><SAMP>`CZ'</SAMP> +<DD> +Czech Republic. +<DT><SAMP>`DE'</SAMP> +<DD> +Germany. +<DT><SAMP>`DJ'</SAMP> +<DD> +Djibouti. +<DT><SAMP>`DK'</SAMP> +<DD> +Denmark. +<DT><SAMP>`DM'</SAMP> +<DD> +Dominica. +<DT><SAMP>`DO'</SAMP> +<DD> +Dominican Republic. +<DT><SAMP>`DZ'</SAMP> +<DD> +Algeria. +<DT><SAMP>`EC'</SAMP> +<DD> +Ecuador. +<DT><SAMP>`EE'</SAMP> +<DD> +Estonia. +<DT><SAMP>`EG'</SAMP> +<DD> +Egypt. +<DT><SAMP>`EH'</SAMP> +<DD> +Western Sahara. +<DT><SAMP>`ER'</SAMP> +<DD> +Eritrea. +<DT><SAMP>`ES'</SAMP> +<DD> +Spain. +<DT><SAMP>`ET'</SAMP> +<DD> +Ethiopia. +<DT><SAMP>`FI'</SAMP> +<DD> +Finland. +<DT><SAMP>`FJ'</SAMP> +<DD> +Fiji. +<DT><SAMP>`FK'</SAMP> +<DD> +Falkland Islands. +<DT><SAMP>`FM'</SAMP> +<DD> +Micronesia. +<DT><SAMP>`FO'</SAMP> +<DD> +Faeroe Islands. +<DT><SAMP>`FR'</SAMP> +<DD> +France. +<DT><SAMP>`GA'</SAMP> +<DD> +Gabon. +<DT><SAMP>`GB'</SAMP> +<DD> +Britain (UK). +<DT><SAMP>`GD'</SAMP> +<DD> +Grenada. +<DT><SAMP>`GE'</SAMP> +<DD> +Georgia. +<DT><SAMP>`GF'</SAMP> +<DD> +French Guiana. +<DT><SAMP>`GH'</SAMP> +<DD> +Ghana. +<DT><SAMP>`GI'</SAMP> +<DD> +Gibraltar. +<DT><SAMP>`GL'</SAMP> +<DD> +Greenland. +<DT><SAMP>`GM'</SAMP> +<DD> +Gambia. +<DT><SAMP>`GN'</SAMP> +<DD> +Guinea. +<DT><SAMP>`GP'</SAMP> +<DD> +Guadeloupe. +<DT><SAMP>`GQ'</SAMP> +<DD> +Equatorial Guinea. +<DT><SAMP>`GR'</SAMP> +<DD> +Greece. +<DT><SAMP>`GS'</SAMP> +<DD> +South Georgia and the South Sandwich Islands. +<DT><SAMP>`GT'</SAMP> +<DD> +Guatemala. +<DT><SAMP>`GU'</SAMP> +<DD> +Guam. +<DT><SAMP>`GW'</SAMP> +<DD> +Guinea-Bissau. +<DT><SAMP>`GY'</SAMP> +<DD> +Guyana. +<DT><SAMP>`HK'</SAMP> +<DD> +Hong Kong. +<DT><SAMP>`HM'</SAMP> +<DD> +Heard Island and McDonald Islands. +<DT><SAMP>`HN'</SAMP> +<DD> +Honduras. +<DT><SAMP>`HR'</SAMP> +<DD> +Croatia. +<DT><SAMP>`HT'</SAMP> +<DD> +Haiti. +<DT><SAMP>`HU'</SAMP> +<DD> +Hungary. +<DT><SAMP>`ID'</SAMP> +<DD> +Indonesia. +<DT><SAMP>`IE'</SAMP> +<DD> +Ireland. +<DT><SAMP>`IL'</SAMP> +<DD> +Israel. +<DT><SAMP>`IN'</SAMP> +<DD> +India. +<DT><SAMP>`IO'</SAMP> +<DD> +British Indian Ocean Territory. +<DT><SAMP>`IQ'</SAMP> +<DD> +Iraq. +<DT><SAMP>`IR'</SAMP> +<DD> +Iran. +<DT><SAMP>`IS'</SAMP> +<DD> +Iceland. +<DT><SAMP>`IT'</SAMP> +<DD> +Italy. +<DT><SAMP>`JM'</SAMP> +<DD> +Jamaica. +<DT><SAMP>`JO'</SAMP> +<DD> +Jordan. +<DT><SAMP>`JP'</SAMP> +<DD> +Japan. +<DT><SAMP>`KE'</SAMP> +<DD> +Kenya. +<DT><SAMP>`KG'</SAMP> +<DD> +Kyrgyzstan. +<DT><SAMP>`KH'</SAMP> +<DD> +Cambodia. +<DT><SAMP>`KI'</SAMP> +<DD> +Kiribati. +<DT><SAMP>`KM'</SAMP> +<DD> +Comoros. +<DT><SAMP>`KN'</SAMP> +<DD> +St Kitts and Nevis. +<DT><SAMP>`KP'</SAMP> +<DD> +Korea (North). +<DT><SAMP>`KR'</SAMP> +<DD> +Korea (South). +<DT><SAMP>`KW'</SAMP> +<DD> +Kuwait. +<DT><SAMP>`KY'</SAMP> +<DD> +Cayman Islands. +<DT><SAMP>`KZ'</SAMP> +<DD> +Kazakhstan. +<DT><SAMP>`LA'</SAMP> +<DD> +Laos. +<DT><SAMP>`LB'</SAMP> +<DD> +Lebanon. +<DT><SAMP>`LC'</SAMP> +<DD> +St Lucia. +<DT><SAMP>`LI'</SAMP> +<DD> +Liechtenstein. +<DT><SAMP>`LK'</SAMP> +<DD> +Sri Lanka. +<DT><SAMP>`LR'</SAMP> +<DD> +Liberia. +<DT><SAMP>`LS'</SAMP> +<DD> +Lesotho. +<DT><SAMP>`LT'</SAMP> +<DD> +Lithuania. +<DT><SAMP>`LU'</SAMP> +<DD> +Luxembourg. +<DT><SAMP>`LV'</SAMP> +<DD> +Latvia. +<DT><SAMP>`LY'</SAMP> +<DD> +Libya. +<DT><SAMP>`MA'</SAMP> +<DD> +Morocco. +<DT><SAMP>`MC'</SAMP> +<DD> +Monaco. +<DT><SAMP>`MD'</SAMP> +<DD> +Moldova. +<DT><SAMP>`MG'</SAMP> +<DD> +Madagascar. +<DT><SAMP>`MH'</SAMP> +<DD> +Marshall Islands. +<DT><SAMP>`MK'</SAMP> +<DD> +Macedonia. +<DT><SAMP>`ML'</SAMP> +<DD> +Mali. +<DT><SAMP>`MM'</SAMP> +<DD> +Myanmar (Burma). +<DT><SAMP>`MN'</SAMP> +<DD> +Mongolia. +<DT><SAMP>`MO'</SAMP> +<DD> +Macao. +<DT><SAMP>`MP'</SAMP> +<DD> +Northern Mariana Islands. +<DT><SAMP>`MQ'</SAMP> +<DD> +Martinique. +<DT><SAMP>`MR'</SAMP> +<DD> +Mauritania. +<DT><SAMP>`MS'</SAMP> +<DD> +Montserrat. +<DT><SAMP>`MT'</SAMP> +<DD> +Malta. +<DT><SAMP>`MU'</SAMP> +<DD> +Mauritius. +<DT><SAMP>`MV'</SAMP> +<DD> +Maldives. +<DT><SAMP>`MW'</SAMP> +<DD> +Malawi. +<DT><SAMP>`MX'</SAMP> +<DD> +Mexico. +<DT><SAMP>`MY'</SAMP> +<DD> +Malaysia. +<DT><SAMP>`MZ'</SAMP> +<DD> +Mozambique. +<DT><SAMP>`NA'</SAMP> +<DD> +Namibia. +<DT><SAMP>`NC'</SAMP> +<DD> +New Caledonia. +<DT><SAMP>`NE'</SAMP> +<DD> +Niger. +<DT><SAMP>`NF'</SAMP> +<DD> +Norfolk Island. +<DT><SAMP>`NG'</SAMP> +<DD> +Nigeria. +<DT><SAMP>`NI'</SAMP> +<DD> +Nicaragua. +<DT><SAMP>`NL'</SAMP> +<DD> +Netherlands. +<DT><SAMP>`NO'</SAMP> +<DD> +Norway. +<DT><SAMP>`NP'</SAMP> +<DD> +Nepal. +<DT><SAMP>`NR'</SAMP> +<DD> +Nauru. +<DT><SAMP>`NU'</SAMP> +<DD> +Niue. +<DT><SAMP>`NZ'</SAMP> +<DD> +New Zealand. +<DT><SAMP>`OM'</SAMP> +<DD> +Oman. +<DT><SAMP>`PA'</SAMP> +<DD> +Panama. +<DT><SAMP>`PE'</SAMP> +<DD> +Peru. +<DT><SAMP>`PF'</SAMP> +<DD> +French Polynesia. +<DT><SAMP>`PG'</SAMP> +<DD> +Papua New Guinea. +<DT><SAMP>`PH'</SAMP> +<DD> +Philippines. +<DT><SAMP>`PK'</SAMP> +<DD> +Pakistan. +<DT><SAMP>`PL'</SAMP> +<DD> +Poland. +<DT><SAMP>`PM'</SAMP> +<DD> +St Pierre and Miquelon. +<DT><SAMP>`PN'</SAMP> +<DD> +Pitcairn. +<DT><SAMP>`PR'</SAMP> +<DD> +Puerto Rico. +<DT><SAMP>`PS'</SAMP> +<DD> +Palestine. +<DT><SAMP>`PT'</SAMP> +<DD> +Portugal. +<DT><SAMP>`PW'</SAMP> +<DD> +Palau. +<DT><SAMP>`PY'</SAMP> +<DD> +Paraguay. +<DT><SAMP>`QA'</SAMP> +<DD> +Qatar. +<DT><SAMP>`RE'</SAMP> +<DD> +Reunion. +<DT><SAMP>`RO'</SAMP> +<DD> +Romania. +<DT><SAMP>`RU'</SAMP> +<DD> +Russia. +<DT><SAMP>`RW'</SAMP> +<DD> +Rwanda. +<DT><SAMP>`SA'</SAMP> +<DD> +Saudi Arabia. +<DT><SAMP>`SB'</SAMP> +<DD> +Solomon Islands. +<DT><SAMP>`SC'</SAMP> +<DD> +Seychelles. +<DT><SAMP>`SD'</SAMP> +<DD> +Sudan. +<DT><SAMP>`SE'</SAMP> +<DD> +Sweden. +<DT><SAMP>`SG'</SAMP> +<DD> +Singapore. +<DT><SAMP>`SH'</SAMP> +<DD> +St Helena. +<DT><SAMP>`SI'</SAMP> +<DD> +Slovenia. +<DT><SAMP>`SJ'</SAMP> +<DD> +Svalbard and Jan Mayen. +<DT><SAMP>`SK'</SAMP> +<DD> +Slovakia. +<DT><SAMP>`SL'</SAMP> +<DD> +Sierra Leone. +<DT><SAMP>`SM'</SAMP> +<DD> +San Marino. +<DT><SAMP>`SN'</SAMP> +<DD> +Senegal. +<DT><SAMP>`SO'</SAMP> +<DD> +Somalia. +<DT><SAMP>`SR'</SAMP> +<DD> +Suriname. +<DT><SAMP>`ST'</SAMP> +<DD> +Sao Tome and Principe. +<DT><SAMP>`SV'</SAMP> +<DD> +El Salvador. +<DT><SAMP>`SY'</SAMP> +<DD> +Syria. +<DT><SAMP>`SZ'</SAMP> +<DD> +Swaziland. +<DT><SAMP>`TC'</SAMP> +<DD> +Turks and Caicos Is. +<DT><SAMP>`TD'</SAMP> +<DD> +Chad. +<DT><SAMP>`TF'</SAMP> +<DD> +French Southern and Antarctic Lands. +<DT><SAMP>`TG'</SAMP> +<DD> +Togo. +<DT><SAMP>`TH'</SAMP> +<DD> +Thailand. +<DT><SAMP>`TJ'</SAMP> +<DD> +Tajikistan. +<DT><SAMP>`TK'</SAMP> +<DD> +Tokelau. +<DT><SAMP>`TM'</SAMP> +<DD> +Turkmenistan. +<DT><SAMP>`TN'</SAMP> +<DD> +Tunisia. +<DT><SAMP>`TO'</SAMP> +<DD> +Tonga. +<DT><SAMP>`TP'</SAMP> +<DD> +East Timor. +<DT><SAMP>`TR'</SAMP> +<DD> +Turkey. +<DT><SAMP>`TT'</SAMP> +<DD> +Trinidad and Tobago. +<DT><SAMP>`TV'</SAMP> +<DD> +Tuvalu. +<DT><SAMP>`TW'</SAMP> +<DD> +Taiwan. +<DT><SAMP>`TZ'</SAMP> +<DD> +Tanzania. +<DT><SAMP>`UA'</SAMP> +<DD> +Ukraine. +<DT><SAMP>`UG'</SAMP> +<DD> +Uganda. +<DT><SAMP>`UM'</SAMP> +<DD> +US minor outlying islands. +<DT><SAMP>`US'</SAMP> +<DD> +United States. +<DT><SAMP>`UY'</SAMP> +<DD> +Uruguay. +<DT><SAMP>`UZ'</SAMP> +<DD> +Uzbekistan. +<DT><SAMP>`VA'</SAMP> +<DD> +Vatican City. +<DT><SAMP>`VC'</SAMP> +<DD> +St Vincent. +<DT><SAMP>`VE'</SAMP> +<DD> +Venezuela. +<DT><SAMP>`VG'</SAMP> +<DD> +Virgin Islands (UK). +<DT><SAMP>`VI'</SAMP> +<DD> +Virgin Islands (US). +<DT><SAMP>`VN'</SAMP> +<DD> +Vietnam. +<DT><SAMP>`VU'</SAMP> +<DD> +Vanuatu. +<DT><SAMP>`WF'</SAMP> +<DD> +Wallis and Futuna. +<DT><SAMP>`WS'</SAMP> +<DD> +Samoa (Western). +<DT><SAMP>`YE'</SAMP> +<DD> +Yemen. +<DT><SAMP>`YT'</SAMP> +<DD> +Mayotte. +<DT><SAMP>`YU'</SAMP> +<DD> +Yugoslavia. +<DT><SAMP>`ZA'</SAMP> +<DD> +South Africa. +<DT><SAMP>`ZM'</SAMP> +<DD> +Zambia. +<DT><SAMP>`ZW'</SAMP> +<DD> +Zimbabwe. +</DL> + +<P><HR><P> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_13.html">previous</A>, next, last section, <A HREF="gettext_toc.html">table of contents</A>. +</BODY> +</HTML> diff --git a/doc/gettext_2.html b/doc/gettext_2.html new file mode 100644 index 0000000..fcf1a05 --- /dev/null +++ b/doc/gettext_2.html @@ -0,0 +1,685 @@ +<HTML> +<HEAD> +<!-- This HTML file has been created by texi2html 1.51 + from gettext.texi on 19 April 2001 --> + +<TITLE>GNU gettext utilities - 2 PO Files and PO Mode Basics</TITLE> +</HEAD> +<BODY> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_1.html">previous</A>, <A HREF="gettext_3.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +<P><HR><P> + + +<H1><A NAME="SEC7" HREF="gettext_toc.html#TOC7">2 PO Files and PO Mode Basics</A></H1> + +<P> +The GNU <CODE>gettext</CODE> toolset helps programmers and translators +at producing, updating and using translation files, mainly those +PO files which are textual, editable files. This chapter stresses +the format of PO files, and contains a PO mode starter. PO mode +description is spread throughout this manual instead of being concentrated +in one place. Here we present only the basics of PO mode. + +</P> + + + +<H2><A NAME="SEC8" HREF="gettext_toc.html#TOC8">2.1 Completing GNU <CODE>gettext</CODE> Installation</A></H2> + +<P> +Once you have received, unpacked, configured and compiled the GNU +<CODE>gettext</CODE> distribution, the <SAMP>`make install'</SAMP> command puts in +place the programs <CODE>xgettext</CODE>, <CODE>msgfmt</CODE>, <CODE>gettext</CODE>, and +<CODE>msgmerge</CODE>, as well as their available message catalogs. To +top off a comfortable installation, you might also want to make the +PO mode available to your Emacs users. + +</P> +<P> +During the installation of the PO mode, you might want to modify your +file <TT>`.emacs'</TT>, once and for all, so it contains a few lines looking +like: + +</P> + +<PRE> +(setq auto-mode-alist + (cons '("\\.po[tx]?\\'\\|\\.po\\." . po-mode) auto-mode-alist)) +(autoload 'po-mode "po-mode" "Major mode for translators to edit PO files" t) +</PRE> + +<P> +Later, whenever you edit some <TT>`.po'</TT>, <TT>`.pot'</TT> or <TT>`.pox'</TT> +file, or any file having the string <SAMP>`.po.'</SAMP> within its name, +Emacs loads <TT>`po-mode.elc'</TT> (or <TT>`po-mode.el'</TT>) as needed, and +automatically activates PO mode commands for the associated buffer. +The string <EM>PO</EM> appears in the mode line for any buffer for +which PO mode is active. Many PO files may be active at once in a +single Emacs session. + +</P> +<P> +If you are using Emacs version 20 or newer, and have already installed +the appropriate international fonts on your system, you may also tell +Emacs how to determine automatically the coding system of every PO file. +This will often (but not always) cause the necessary fonts to be loaded +and used for displaying the translations on your Emacs screen. For this +to happen, add the lines: + +</P> + +<PRE> +(modify-coding-system-alist 'file "\\.po[tx]?\\'\\|\\.po\\." + 'po-find-file-coding-system) +(autoload 'po-find-file-coding-system "po-mode") +</PRE> + +<P> +to your <TT>`.emacs'</TT> file. If, with this, you still see boxes instead +of international characters, try a different font set (via Shift Mouse +button 1). + +</P> + + +<H2><A NAME="SEC9" HREF="gettext_toc.html#TOC9">2.2 The Format of PO Files</A></H2> + +<P> +A PO file is made up of many entries, each entry holding the relation +between an original untranslated string and its corresponding +translation. All entries in a given PO file usually pertain +to a single project, and all translations are expressed in a single +target language. One PO file <STRONG>entry</STRONG> has the following schematic +structure: + +</P> + +<PRE> +<VAR>white-space</VAR> +# <VAR>translator-comments</VAR> +#. <VAR>automatic-comments</VAR> +#: <VAR>reference</VAR>... +#, <VAR>flag</VAR>... +msgid <VAR>untranslated-string</VAR> +msgstr <VAR>translated-string</VAR> +</PRE> + +<P> +The general structure of a PO file should be well understood by +the translator. When using PO mode, very little has to be known +about the format details, as PO mode takes care of them for her. + +</P> +<P> +Entries begin with some optional white space. Usually, when generated +through GNU <CODE>gettext</CODE> tools, there is exactly one blank line +between entries. Then comments follow, on lines all starting with the +character <KBD>#</KBD>. There are two kinds of comments: those which have +some white space immediately following the <KBD>#</KBD>, which comments are +created and maintained exclusively by the translator, and those which +have some non-white character just after the <KBD>#</KBD>, which comments +are created and maintained automatically by GNU <CODE>gettext</CODE> tools. +All comments, of either kind, are optional. + +</P> +<P> +After white space and comments, entries show two strings, namely +first the untranslated string as it appears in the original program +sources, and then, the translation of this string. The original +string is introduced by the keyword <CODE>msgid</CODE>, and the translation, +by <CODE>msgstr</CODE>. The two strings, untranslated and translated, +are quoted in various ways in the PO file, using <KBD>"</KBD> +delimiters and <KBD>\</KBD> escapes, but the translator does not really +have to pay attention to the precise quoting format, as PO mode fully +takes care of quoting for her. + +</P> +<P> +The <CODE>msgid</CODE> strings, as well as automatic comments, are produced +and managed by other GNU <CODE>gettext</CODE> tools, and PO mode does not +provide means for the translator to alter these. The most she can +do is merely deleting them, and only by deleting the whole entry. +On the other hand, the <CODE>msgstr</CODE> string, as well as translator +comments, are really meant for the translator, and PO mode gives her +the full control she needs. + +</P> +<P> +The comment lines beginning with <KBD>#,</KBD> are special because they are +not completely ignored by the programs as comments generally are. The +comma separated list of <VAR>flag</VAR>s is used by the <CODE>msgfmt</CODE> +program to give the user some better diagnostic messages. Currently +there are two forms of flags defined: + +</P> +<DL COMPACT> + +<DT><KBD>fuzzy</KBD> +<DD> +This flag can be generated by the <CODE>msgmerge</CODE> program or it can be +inserted by the translator herself. It shows that the <CODE>msgstr</CODE> +string might not be a correct translation (anymore). Only the translator +can judge if the translation requires further modification, or is +acceptable as is. Once satisfied with the translation, she then removes +this <KBD>fuzzy</KBD> attribute. The <CODE>msgmerge</CODE> program inserts this +when it combined the <CODE>msgid</CODE> and <CODE>msgstr</CODE> entries after fuzzy +search only. See section <A HREF="gettext_6.html#SEC25">6.3 Fuzzy Entries</A>. + +<DT><KBD>c-format</KBD> +<DD> +<DT><KBD>no-c-format</KBD> +<DD> +These flags should not be added by a human. Instead only the +<CODE>xgettext</CODE> program adds them. In an automatized PO file processing +system as proposed here the user changes would be thrown away again as +soon as the <CODE>xgettext</CODE> program generates a new template file. + +In case the <KBD>c-format</KBD> flag is given for a string the <CODE>msgfmt</CODE> +does some more tests to check to validity of the translation. +See section <A HREF="gettext_7.html#SEC35">7.1 Invoking the <CODE>msgfmt</CODE> Program</A>. + +</DL> + +<P> +A different kind of entries is used for translations which involve +plural forms. + +</P> + +<PRE> +<VAR>white-space</VAR> +# <VAR>translator-comments</VAR> +#. <VAR>automatic-comments</VAR> +#: <VAR>reference</VAR>... +#, <VAR>flag</VAR>... +msgid <VAR>untranslated-string-singular</VAR> +msgid_plural <VAR>untranslated-string-plural</VAR> +msgstr[0] <VAR>translated-string-case-0</VAR> +... +msgstr[N] <VAR>translated-string-case-n</VAR> +</PRE> + +<P> +It happens that some lines, usually whitespace or comments, follow the +very last entry of a PO file. Such lines are not part of any entry, +and PO mode is unable to take action on those lines. By using the +PO mode function <KBD>M-x po-normalize</KBD>, the translator may get +rid of those spurious lines. See section <A HREF="gettext_2.html#SEC12">2.5 Normalizing Strings in Entries</A>. + +</P> +<P> +The remainder of this section may be safely skipped by those using +PO mode, yet it may be interesting for everybody to have a better +idea of the precise format of a PO file. On the other hand, those +not having Emacs handy should carefully continue reading on. + +</P> +<P> +Each of <VAR>untranslated-string</VAR> and <VAR>translated-string</VAR> respects +the C syntax for a character string, including the surrounding quotes +and imbedded backslashed escape sequences. When the time comes +to write multi-line strings, one should not use escaped newlines. +Instead, a closing quote should follow the last character on the +line to be continued, and an opening quote should resume the string +at the beginning of the following PO file line. For example: + +</P> + +<PRE> +msgid "" +"Here is an example of how one might continue a very long string\n" +"for the common case the string represents multi-line output.\n" +</PRE> + +<P> +In this example, the empty string is used on the first line, to +allow better alignment of the <KBD>H</KBD> from the word <SAMP>`Here'</SAMP> +over the <KBD>f</KBD> from the word <SAMP>`for'</SAMP>. In this example, the +<CODE>msgid</CODE> keyword is followed by three strings, which are meant +to be concatenated. Concatenating the empty string does not change +the resulting overall string, but it is a way for us to comply with +the necessity of <CODE>msgid</CODE> to be followed by a string on the same +line, while keeping the multi-line presentation left-justified, as +we find this to be a cleaner disposition. The empty string could have +been omitted, but only if the string starting with <SAMP>`Here'</SAMP> was +promoted on the first line, right after <CODE>msgid</CODE>.<A NAME="DOCF2" HREF="gettext_foot.html#FOOT2">(2)</A> It was not really necessary +either to switch between the two last quoted strings immediately after +the newline <SAMP>`\n'</SAMP>, the switch could have occurred after <EM>any</EM> +other character, we just did it this way because it is neater. + +</P> +<P> +One should carefully distinguish between end of lines marked as +<SAMP>`\n'</SAMP> <EM>inside</EM> quotes, which are part of the represented +string, and end of lines in the PO file itself, outside string quotes, +which have no incidence on the represented string. + +</P> +<P> +Outside strings, white lines and comments may be used freely. +Comments start at the beginning of a line with <SAMP>`#'</SAMP> and extend +until the end of the PO file line. Comments written by translators +should have the initial <SAMP>`#'</SAMP> immediately followed by some white +space. If the <SAMP>`#'</SAMP> is not immediately followed by white space, +this comment is most likely generated and managed by specialized GNU +tools, and might disappear or be replaced unexpectedly when the PO +file is given to <CODE>msgmerge</CODE>. + +</P> + + +<H2><A NAME="SEC10" HREF="gettext_toc.html#TOC10">2.3 Main PO mode Commands</A></H2> + +<P> +After setting up Emacs with something similar to the lines in +section <A HREF="gettext_2.html#SEC8">2.1 Completing GNU <CODE>gettext</CODE> Installation</A>, PO mode is activated for a window when Emacs finds a +PO file in that window. This puts the window read-only and establishes a +po-mode-map, which is a genuine Emacs mode, in a way that is not derived +from text mode in any way. Functions found on <CODE>po-mode-hook</CODE>, +if any, will be executed. + +</P> +<P> +When PO mode is active in a window, the letters <SAMP>`PO'</SAMP> appear +in the mode line for that window. The mode line also displays how +many entries of each kind are held in the PO file. For example, +the string <SAMP>`132t+3f+10u+2o'</SAMP> would tell the translator that the +PO mode contains 132 translated entries (see section <A HREF="gettext_6.html#SEC24">6.2 Translated Entries</A>, +3 fuzzy entries (see section <A HREF="gettext_6.html#SEC25">6.3 Fuzzy Entries</A>), 10 untranslated entries +(see section <A HREF="gettext_6.html#SEC26">6.4 Untranslated Entries</A>) and 2 obsolete entries (see section <A HREF="gettext_6.html#SEC27">6.5 Obsolete Entries</A>). Zero-coefficients items are not shown. So, in this example, if +the fuzzy entries were unfuzzied, the untranslated entries were translated +and the obsolete entries were deleted, the mode line would merely display +<SAMP>`145t'</SAMP> for the counters. + +</P> +<P> +The main PO commands are those which do not fit into the other categories of +subsequent sections. These allow for quitting PO mode or for managing windows +in special ways. + +</P> +<DL COMPACT> + +<DT><KBD>U</KBD> +<DD> +Undo last modification to the PO file. + +<DT><KBD>Q</KBD> +<DD> +Quit processing and save the PO file. + +<DT><KBD>q</KBD> +<DD> +Quit processing, possibly after confirmation. + +<DT><KBD>O</KBD> +<DD> +Temporary leave the PO file window. + +<DT><KBD>?</KBD> +<DD> +<DT><KBD>h</KBD> +<DD> +Show help about PO mode. + +<DT><KBD>=</KBD> +<DD> +Give some PO file statistics. + +<DT><KBD>V</KBD> +<DD> +Batch validate the format of the whole PO file. + +</DL> + +<P> +The command <KBD>U</KBD> (<CODE>po-undo</CODE>) interfaces to the Emacs +<EM>undo</EM> facility. See section `Undoing Changes' in <CITE>The Emacs Editor</CITE>. Each time <KBD>U</KBD> is typed, modifications which the translator +did to the PO file are undone a little more. For the purpose of +undoing, each PO mode command is atomic. This is especially true for +the <KBD><KBD>RET</KBD></KBD> command: the whole edition made by using a single +use of this command is undone at once, even if the edition itself +implied several actions. However, while in the editing window, one +can undo the edition work quite parsimoniously. + +</P> +<P> +The commands <KBD>Q</KBD> (<CODE>po-quit</CODE>) and <KBD>q</KBD> +(<CODE>po-confirm-and-quit</CODE>) are used when the translator is done with the +PO file. The former is a bit less verbose than the latter. If the file +has been modified, it is saved to disk first. In both cases, and prior to +all this, the commands check if some untranslated message remains in the +PO file and, if yes, the translator is asked if she really wants to leave +off working with this PO file. This is the preferred way of getting rid +of an Emacs PO file buffer. Merely killing it through the usual command +<KBD>C-x k</KBD> (<CODE>kill-buffer</CODE>) is not the tidiest way to proceed. + +</P> +<P> +The command <KBD>O</KBD> (<CODE>po-other-window</CODE>) is another, softer way, +to leave PO mode, temporarily. It just moves the cursor to some other +Emacs window, and pops one if necessary. For example, if the translator +just got PO mode to show some source context in some other, she might +discover some apparent bug in the program source that needs correction. +This command allows the translator to change sex, become a programmer, +and have the cursor right into the window containing the program she +(or rather <EM>he</EM>) wants to modify. By later getting the cursor back +in the PO file window, or by asking Emacs to edit this file once again, +PO mode is then recovered. + +</P> +<P> +The command <KBD>h</KBD> (<CODE>po-help</CODE>) displays a summary of all available PO +mode commands. The translator should then type any character to resume +normal PO mode operations. The command <KBD>?</KBD> has the same effect +as <KBD>h</KBD>. + +</P> +<P> +The command <KBD>=</KBD> (<CODE>po-statistics</CODE>) computes the total number of +entries in the PO file, the ordinal of the current entry (counted from +1), the number of untranslated entries, the number of obsolete entries, +and displays all these numbers. + +</P> +<P> +The command <KBD>V</KBD> (<CODE>po-validate</CODE>) launches <CODE>msgfmt</CODE> in verbose +mode over the current PO file. This command first offers to save the +current PO file on disk. The <CODE>msgfmt</CODE> tool, from GNU <CODE>gettext</CODE>, +has the purpose of creating a MO file out of a PO file, and PO mode uses +the features of this program for checking the overall format of a PO file, +as well as all individual entries. + +</P> +<P> +The program <CODE>msgfmt</CODE> runs asynchronously with Emacs, so the +translator regains control immediately while her PO file is being studied. +Error output is collected in the Emacs <SAMP>`*compilation*'</SAMP> buffer, +displayed in another window. The regular Emacs command <KBD>C-x`</KBD> +(<CODE>next-error</CODE>), as well as other usual compile commands, allow the +translator to reposition quickly to the offending parts of the PO file. +Once the cursor is on the line in error, the translator may decide on +any PO mode action which would help correcting the error. + +</P> + + +<H2><A NAME="SEC11" HREF="gettext_toc.html#TOC11">2.4 Entry Positioning</A></H2> + +<P> +The cursor in a PO file window is almost always part of +an entry. The only exceptions are the special case when the cursor +is after the last entry in the file, or when the PO file is +empty. The entry where the cursor is found to be is said to be the +current entry. Many PO mode commands operate on the current entry, +so moving the cursor does more than allowing the translator to browse +the PO file, this also selects on which entry commands operate. + +</P> +<P> +Some PO mode commands alter the position of the cursor in a specialized +way. A few of those special purpose positioning are described here, +the others are described in following sections. + +</P> +<DL COMPACT> + +<DT><KBD>.</KBD> +<DD> +Redisplay the current entry. + +<DT><KBD>n</KBD> +<DD> +<DT><KBD>n</KBD> +<DD> +Select the entry after the current one. + +<DT><KBD>p</KBD> +<DD> +<DT><KBD>p</KBD> +<DD> +Select the entry before the current one. + +<DT><KBD><</KBD> +<DD> +Select the first entry in the PO file. + +<DT><KBD>></KBD> +<DD> +Select the last entry in the PO file. + +<DT><KBD>m</KBD> +<DD> +Record the location of the current entry for later use. + +<DT><KBD>l</KBD> +<DD> +Return to a previously saved entry location. + +<DT><KBD>x</KBD> +<DD> +Exchange the current entry location with the previously saved one. + +</DL> + +<P> +Any Emacs command able to reposition the cursor may be used +to select the current entry in PO mode, including commands which +move by characters, lines, paragraphs, screens or pages, and search +commands. However, there is a kind of standard way to display the +current entry in PO mode, which usual Emacs commands moving +the cursor do not especially try to enforce. The command <KBD>.</KBD> +(<CODE>po-current-entry</CODE>) has the sole purpose of redisplaying the +current entry properly, after the current entry has been changed by +means external to PO mode, or the Emacs screen otherwise altered. + +</P> +<P> +It is yet to be decided if PO mode helps the translator, or otherwise +irritates her, by forcing a rigid window disposition while she +is doing her work. We originally had quite precise ideas about +how windows should behave, but on the other hand, anyone used to +Emacs is often happy to keep full control. Maybe a fixed window +disposition might be offered as a PO mode option that the translator +might activate or deactivate at will, so it could be offered on an +experimental basis. If nobody feels a real need for using it, or +a compulsion for writing it, we should drop this whole idea. +The incentive for doing it should come from translators rather than +programmers, as opinions from an experienced translator are surely +more worth to me than opinions from programmers <EM>thinking</EM> about +how <EM>others</EM> should do translation. + +</P> +<P> +The commands <KBD>n</KBD> (<CODE>po-next-entry</CODE>) and <KBD>p</KBD> +(<CODE>po-previous-entry</CODE>) move the cursor the entry following, +or preceding, the current one. If <KBD>n</KBD> is given while the +cursor is on the last entry of the PO file, or if <KBD>p</KBD> +is given while the cursor is on the first entry, no move is done. + +</P> +<P> +The commands <KBD><</KBD> (<CODE>po-first-entry</CODE>) and <KBD>></KBD> +(<CODE>po-last-entry</CODE>) move the cursor to the first entry, or last +entry, of the PO file. When the cursor is located past the last +entry in a PO file, most PO mode commands will return an error saying +<SAMP>`After last entry'</SAMP>. Moreover, the commands <KBD><</KBD> and <KBD>></KBD> +have the special property of being able to work even when the cursor +is not into some PO file entry, and one may use them for nicely +correcting this situation. But even these commands will fail on a +truly empty PO file. There are development plans for the PO mode for it +to interactively fill an empty PO file from sources. See section <A HREF="gettext_3.html#SEC16">3.3 Marking Translatable Strings</A>. + +</P> +<P> +The translator may decide, before working at the translation of +a particular entry, that she needs to browse the remainder of the +PO file, maybe for finding the terminology or phraseology used +in related entries. She can of course use the standard Emacs idioms +for saving the current cursor location in some register, and use that +register for getting back, or else, use the location ring. + +</P> +<P> +PO mode offers another approach, by which cursor locations may be saved +onto a special stack. The command <KBD>m</KBD> (<CODE>po-push-location</CODE>) +merely adds the location of current entry to the stack, pushing +the already saved locations under the new one. The command +<KBD>r</KBD> (<CODE>po-pop-location</CODE>) consumes the top stack element and +repositions the cursor to the entry associated with that top element. +This position is then lost, for the next <KBD>r</KBD> will move the cursor +to the previously saved location, and so on until no locations remain +on the stack. + +</P> +<P> +If the translator wants the position to be kept on the location stack, +maybe for taking a look at the entry associated with the top +element, then go elsewhere with the intent of getting back later, she +ought to use <KBD>m</KBD> immediately after <KBD>r</KBD>. + +</P> +<P> +The command <KBD>x</KBD> (<CODE>po-exchange-location</CODE>) simultaneously +repositions the cursor to the entry associated with the top element of +the stack of saved locations, and replaces that top element with the +location of the current entry before the move. Consequently, repeating +the <KBD>x</KBD> command toggles alternatively between two entries. +For achieving this, the translator will position the cursor on the +first entry, use <KBD>m</KBD>, then position to the second entry, and +merely use <KBD>x</KBD> for making the switch. + +</P> + + +<H2><A NAME="SEC12" HREF="gettext_toc.html#TOC12">2.5 Normalizing Strings in Entries</A></H2> + +<P> +There are many different ways for encoding a particular string into a +PO file entry, because there are so many different ways to split and +quote multi-line strings, and even, to represent special characters +by backslahsed escaped sequences. Some features of PO mode rely on +the ability for PO mode to scan an already existing PO file for a +particular string encoded into the <CODE>msgid</CODE> field of some entry. +Even if PO mode has internally all the built-in machinery for +implementing this recognition easily, doing it fast is technically +difficult. To facilitate a solution to this efficiency problem, +we decided on a canonical representation for strings. + +</P> +<P> +A conventional representation of strings in a PO file is currently +under discussion, and PO mode experiments with a canonical representation. +Having both <CODE>xgettext</CODE> and PO mode converging towards a uniform +way of representing equivalent strings would be useful, as the internal +normalization needed by PO mode could be automatically satisfied +when using <CODE>xgettext</CODE> from GNU <CODE>gettext</CODE>. An explicit +PO mode normalization should then be only necessary for PO files +imported from elsewhere, or for when the convention itself evolves. + +</P> +<P> +So, for achieving normalization of at least the strings of a given +PO file needing a canonical representation, the following PO mode +command is available: + +</P> +<DL COMPACT> + +<DT><KBD>M-x po-normalize</KBD> +<DD> +Tidy the whole PO file by making entries more uniform. + +</DL> + +<P> +The special command <KBD>M-x po-normalize</KBD>, which has no associated +keys, revises all entries, ensuring that strings of both original +and translated entries use uniform internal quoting in the PO file. +It also removes any crumb after the last entry. This command may be +useful for PO files freshly imported from elsewhere, or if we ever +improve on the canonical quoting format we use. This canonical format +is not only meant for getting cleaner PO files, but also for greatly +speeding up <CODE>msgid</CODE> string lookup for some other PO mode commands. + +</P> +<P> +<KBD>M-x po-normalize</KBD> presently makes three passes over the entries. +The first implements heuristics for converting PO files for GNU +<CODE>gettext</CODE> 0.6 and earlier, in which <CODE>msgid</CODE> and <CODE>msgstr</CODE> +fields were using K&R style C string syntax for multi-line strings. +These heuristics may fail for comments not related to obsolete +entries and ending with a backslash; they also depend on subsequent +passes for finalizing the proper commenting of continued lines for +obsolete entries. This first pass might disappear once all oldish PO +files would have been adjusted. The second and third pass normalize +all <CODE>msgid</CODE> and <CODE>msgstr</CODE> strings respectively. They also +clean out those trailing backslashes used by XView's <CODE>msgfmt</CODE> +for continued lines. + +</P> +<P> +Having such an explicit normalizing command allows for importing PO +files from other sources, but also eases the evolution of the current +convention, evolution driven mostly by aesthetic concerns, as of now. +It is easy to make suggested adjustments at a later time, as the +normalizing command and eventually, other GNU <CODE>gettext</CODE> tools +should greatly automate conformance. A description of the canonical +string format is given below, for the particular benefit of those not +having Emacs handy, and who would nevertheless want to handcraft +their PO files in nice ways. + +</P> +<P> +Right now, in PO mode, strings are single line or multi-line. A string +goes multi-line if and only if it has <EM>embedded</EM> newlines, that +is, if it matches <SAMP>`[^\n]\n+[^\n]'</SAMP>. So, we would have: + +</P> + +<PRE> +msgstr "\n\nHello, world!\n\n\n" +</PRE> + +<P> +but, replacing the space by a newline, this becomes: + +</P> + +<PRE> +msgstr "" +"\n" +"\n" +"Hello,\n" +"world!\n" +"\n" +"\n" +</PRE> + +<P> +We are deliberately using a caricatural example, here, to make the +point clearer. Usually, multi-lines are not that bad looking. +It is probable that we will implement the following suggestion. +We might lump together all initial newlines into the empty string, +and also all newlines introducing empty lines (that is, for <VAR>n</VAR> +> 1, the <VAR>n</VAR>-1'th last newlines would go together on a separate +string), so making the previous example appear: + +</P> + +<PRE> +msgstr "\n\n" +"Hello,\n" +"world!\n" +"\n\n" +</PRE> + +<P> +There are a few yet undecided little points about string normalization, +to be documented in this manual, once these questions settle. + +</P> +<P><HR><P> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_1.html">previous</A>, <A HREF="gettext_3.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +</BODY> +</HTML> diff --git a/doc/gettext_3.html b/doc/gettext_3.html new file mode 100644 index 0000000..ee3715d --- /dev/null +++ b/doc/gettext_3.html @@ -0,0 +1,620 @@ +<HTML> +<HEAD> +<!-- This HTML file has been created by texi2html 1.51 + from gettext.texi on 19 April 2001 --> + +<TITLE>GNU gettext utilities - 3 Preparing Program Sources</TITLE> +</HEAD> +<BODY> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_2.html">previous</A>, <A HREF="gettext_4.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +<P><HR><P> + + +<H1><A NAME="SEC13" HREF="gettext_toc.html#TOC13">3 Preparing Program Sources</A></H1> + +<P> +For the programmer, changes to the C source code fall into three +categories. First, you have to make the localization functions +known to all modules needing message translation. Second, you should +properly trigger the operation of GNU <CODE>gettext</CODE> when the program +initializes, usually from the <CODE>main</CODE> function. Last, you should +identify and especially mark all constant strings in your program +needing translation. + +</P> +<P> +Presuming that your set of programs, or package, has been adjusted +so all needed GNU <CODE>gettext</CODE> files are available, and your +<TT>`Makefile'</TT> files are adjusted (see section <A HREF="gettext_11.html#SEC72">11 The Maintainer's View</A>), each C module +having translated C strings should contain the line: + +</P> + +<PRE> +#include <libintl.h> +</PRE> + +<P> +The remaining changes to your C sources are discussed in the further +sections of this chapter. + +</P> + + + +<H2><A NAME="SEC14" HREF="gettext_toc.html#TOC14">3.1 Triggering <CODE>gettext</CODE> Operations</A></H2> + +<P> +The initialization of locale data should be done with more or less +the same code in every program, as demonstrated below: + +</P> + +<PRE> +int +main (argc, argv) + int argc; + char argv; +{ + ... + setlocale (LC_ALL, ""); + bindtextdomain (PACKAGE, LOCALEDIR); + textdomain (PACKAGE); + ... +} +</PRE> + +<P> +<VAR>PACKAGE</VAR> and <VAR>LOCALEDIR</VAR> should be provided either by +<TT>`config.h'</TT> or by the Makefile. For now consult the <CODE>gettext</CODE> +sources for more information. + +</P> +<P> +The use of <CODE>LC_ALL</CODE> might not be appropriate for you. +<CODE>LC_ALL</CODE> includes all locale categories and especially +<CODE>LC_CTYPE</CODE>. This later category is responsible for determining +character classes with the <CODE>isalnum</CODE> etc. functions from +<TT>`ctype.h'</TT> which could especially for programs, which process some +kind of input language, be wrong. For example this would mean that a +source code using the @,{c} (c-cedilla character) is runnable in +France but not in the U.S. + +</P> +<P> +Some systems also have problems with parsing numbers using the +<CODE>scanf</CODE> functions if an other but the <CODE>LC_ALL</CODE> locale is used. +The standards say that additional formats but the one known in the +<CODE>"C"</CODE> locale might be recognized. But some systems seem to reject +numbers in the <CODE>"C"</CODE> locale format. In some situation, it might +also be a problem with the notation itself which makes it impossible to +recognize whether the number is in the <CODE>"C"</CODE> locale or the local +format. This can happen if thousands separator characters are used. +Some locales define this character accordfing to the national +conventions to <CODE>'.'</CODE> which is the same character used in the +<CODE>"C"</CODE> locale to denote the decimal point. + +</P> +<P> +So it is sometimes necessary to replace the <CODE>LC_ALL</CODE> line in the +code above by a sequence of <CODE>setlocale</CODE> lines + +</P> + +<PRE> +{ + ... + setlocale (LC_CTYPE, ""); + setlocale (LC_MESSAGES, ""); + ... +} +</PRE> + +<P> +On all POSIX conformant systems the locale categories <CODE>LC_CTYPE</CODE>, +<CODE>LC_COLLATE</CODE>, <CODE>LC_MONETARY</CODE>, <CODE>LC_NUMERIC</CODE>, and +<CODE>LC_TIME</CODE> are available. On some modern systems there is also a +locale <CODE>LC_MESSAGES</CODE> which is called on some old, XPG2 compliant +systems <CODE>LC_RESPONSES</CODE>. + +</P> +<P> +Note that changing the <CODE>LC_CTYPE</CODE> also affects the functions +declared in the <CODE><ctype.h></CODE> standard header. If this is not +desirable in your application (for example in a compiler's parser), +you can use a set of substitute functions which hardwire the C locale, +such as found in the <CODE><c-ctype.h></CODE> and <CODE><c-ctype.c></CODE> files +in the gettext source distribution. + +</P> +<P> +It is also possible to switch the locale forth and back between the +environment dependent locale and the C locale, but this approach is +normally avoided because a <CODE>setlocale</CODE> call is expensive, +because it is tedious to determine the places where a locale switch +is needed in a large program's source, and because switching a locale +is not multithread-safe. + +</P> + + +<H2><A NAME="SEC15" HREF="gettext_toc.html#TOC15">3.2 How Marks Appear in Sources</A></H2> + +<P> +All strings requiring translation should be marked in the C sources. Marking +is done in such a way that each translatable string appears to be +the sole argument of some function or preprocessor macro. There are +only a few such possible functions or macros meant for translation, +and their names are said to be marking keywords. The marking is +attached to strings themselves, rather than to what we do with them. +This approach has more uses. A blatant example is an error message +produced by formatting. The format string needs translation, as +well as some strings inserted through some <SAMP>`%s'</SAMP> specification +in the format, while the result from <CODE>sprintf</CODE> may have so many +different instances that it is impractical to list them all in some +<SAMP>`error_string_out()'</SAMP> routine, say. + +</P> +<P> +This marking operation has two goals. The first goal of marking +is for triggering the retrieval of the translation, at run time. +The keyword are possibly resolved into a routine able to dynamically +return the proper translation, as far as possible or wanted, for the +argument string. Most localizable strings are found in executable +positions, that is, attached to variables or given as parameters to +functions. But this is not universal usage, and some translatable +strings appear in structured initializations. See section <A HREF="gettext_3.html#SEC18">3.5 Special Cases of Translatable Strings</A>. + +</P> +<P> +The second goal of the marking operation is to help <CODE>xgettext</CODE> +at properly extracting all translatable strings when it scans a set +of program sources and produces PO file templates. + +</P> +<P> +The canonical keyword for marking translatable strings is +<SAMP>`gettext'</SAMP>, it gave its name to the whole GNU <CODE>gettext</CODE> +package. For packages making only light use of the <SAMP>`gettext'</SAMP> +keyword, macro or function, it is easily used <EM>as is</EM>. However, +for packages using the <CODE>gettext</CODE> interface more heavily, it +is usually more convenient to give the main keyword a shorter, less +obtrusive name. Indeed, the keyword might appear on a lot of strings +all over the package, and programmers usually do not want nor need +their program sources to remind them forcefully, all the time, that they +are internationalized. Further, a long keyword has the disadvantage +of using more horizontal space, forcing more indentation work on +sources for those trying to keep them within 79 or 80 columns. + +</P> +<P> +Many packages use <SAMP>`_'</SAMP> (a simple underline) as a keyword, +and write <SAMP>`_("Translatable string")'</SAMP> instead of <SAMP>`gettext +("Translatable string")'</SAMP>. Further, the coding rule, from GNU standards, +wanting that there is a space between the keyword and the opening +parenthesis is relaxed, in practice, for this particular usage. +So, the textual overhead per translatable string is reduced to +only three characters: the underline and the two parentheses. +However, even if GNU <CODE>gettext</CODE> uses this convention internally, +it does not offer it officially. The real, genuine keyword is truly +<SAMP>`gettext'</SAMP> indeed. It is fairly easy for those wanting to use +<SAMP>`_'</SAMP> instead of <SAMP>`gettext'</SAMP> to declare: + +</P> + +<PRE> +#include <libintl.h> +#define _(String) gettext (String) +</PRE> + +<P> +instead of merely using <SAMP>`#include <libintl.h>'</SAMP>. + +</P> +<P> +Later on, the maintenance is relatively easy. If, as a programmer, +you add or modify a string, you will have to ask yourself if the +new or altered string requires translation, and include it within +<SAMP>`_()'</SAMP> if you think it should be translated. <SAMP>`"%s: %d"'</SAMP> is +an example of string <EM>not</EM> requiring translation! + +</P> + + +<H2><A NAME="SEC16" HREF="gettext_toc.html#TOC16">3.3 Marking Translatable Strings</A></H2> + +<P> +In PO mode, one set of features is meant more for the programmer than +for the translator, and allows him to interactively mark which strings, +in a set of program sources, are translatable, and which are not. +Even if it is a fairly easy job for a programmer to find and mark +such strings by other means, using any editor of his choice, PO mode +makes this work more comfortable. Further, this gives translators +who feel a little like programmers, or programmers who feel a little +like translators, a tool letting them work at marking translatable +strings in the program sources, while simultaneously producing a set of +translation in some language, for the package being internationalized. + +</P> +<P> +The set of program sources, targetted by the PO mode commands describe +here, should have an Emacs tags table constructed for your project, +prior to using these PO file commands. This is easy to do. In any +shell window, change the directory to the root of your project, then +execute a command resembling: + +</P> + +<PRE> +etags src/*.[hc] lib/*.[hc] +</PRE> + +<P> +presuming here you want to process all <TT>`.h'</TT> and <TT>`.c'</TT> files +from the <TT>`src/'</TT> and <TT>`lib/'</TT> directories. This command will +explore all said files and create a <TT>`TAGS'</TT> file in your root +directory, somewhat summarizing the contents using a special file +format Emacs can understand. + +</P> +<P> +For packages following the GNU coding standards, there is +a make goal <CODE>tags</CODE> or <CODE>TAGS</CODE> which constructs the tag files in +all directories and for all files containing source code. + +</P> +<P> +Once your <TT>`TAGS'</TT> file is ready, the following commands assist +the programmer at marking translatable strings in his set of sources. +But these commands are necessarily driven from within a PO file +window, and it is likely that you do not even have such a PO file yet. +This is not a problem at all, as you may safely open a new, empty PO +file, mainly for using these commands. This empty PO file will slowly +fill in while you mark strings as translatable in your program sources. + +</P> +<DL COMPACT> + +<DT><KBD>,</KBD> +<DD> +Search through program sources for a string which looks like a +candidate for translation. + +<DT><KBD>M-,</KBD> +<DD> +Mark the last string found with <SAMP>`_()'</SAMP>. + +<DT><KBD>M-.</KBD> +<DD> +Mark the last string found with a keyword taken from a set of possible +keywords. This command with a prefix allows some management of these +keywords. + +</DL> + +<P> +The <KBD>,</KBD> (<CODE>po-tags-search</CODE>) command searches for the next +occurrence of a string which looks like a possible candidate for +translation, and displays the program source in another Emacs window, +positioned in such a way that the string is near the top of this other +window. If the string is too big to fit whole in this window, it is +positioned so only its end is shown. In any case, the cursor +is left in the PO file window. If the shown string would be better +presented differently in different native languages, you may mark it +using <KBD>M-,</KBD> or <KBD>M-.</KBD>. Otherwise, you might rather ignore it +and skip to the next string by merely repeating the <KBD>,</KBD> command. + +</P> +<P> +A string is a good candidate for translation if it contains a sequence +of three or more letters. A string containing at most two letters in +a row will be considered as a candidate if it has more letters than +non-letters. The command disregards strings containing no letters, +or isolated letters only. It also disregards strings within comments, +or strings already marked with some keyword PO mode knows (see below). + +</P> +<P> +If you have never told Emacs about some <TT>`TAGS'</TT> file to use, the +command will request that you specify one from the minibuffer, the +first time you use the command. You may later change your <TT>`TAGS'</TT> +file by using the regular Emacs command <KBD>M-x visit-tags-table</KBD>, +which will ask you to name the precise <TT>`TAGS'</TT> file you want +to use. See section `Tag Tables' in <CITE>The Emacs Editor</CITE>. + +</P> +<P> +Each time you use the <KBD>,</KBD> command, the search resumes from where it was +left by the previous search, and goes through all program sources, +obeying the <TT>`TAGS'</TT> file, until all sources have been processed. +However, by giving a prefix argument to the command (<KBD>C-u +,)</KBD>, you may request that the search be restarted all over again +from the first program source; but in this case, strings that you +recently marked as translatable will be automatically skipped. + +</P> +<P> +Using this <KBD>,</KBD> command does not prevent using of other regular +Emacs tags commands. For example, regular <CODE>tags-search</CODE> or +<CODE>tags-query-replace</CODE> commands may be used without disrupting the +independent <KBD>,</KBD> search sequence. However, as implemented, the +<EM>initial</EM> <KBD>,</KBD> command (or the <KBD>,</KBD> command is used with a +prefix) might also reinitialize the regular Emacs tags searching to the +first tags file, this reinitialization might be considered spurious. + +</P> +<P> +The <KBD>M-,</KBD> (<CODE>po-mark-translatable</CODE>) command will mark the +recently found string with the <SAMP>`_'</SAMP> keyword. The <KBD>M-.</KBD> +(<CODE>po-select-mark-and-mark</CODE>) command will request that you type +one keyword from the minibuffer and use that keyword for marking +the string. Both commands will automatically create a new PO file +untranslated entry for the string being marked, and make it the +current entry (making it easy for you to immediately proceed to its +translation, if you feel like doing it right away). It is possible +that the modifications made to the program source by <KBD>M-,</KBD> or +<KBD>M-.</KBD> render some source line longer than 80 columns, forcing you +to break and re-indent this line differently. You may use the <KBD>O</KBD> +command from PO mode, or any other window changing command from +Emacs, to break out into the program source window, and do any +needed adjustments. You will have to use some regular Emacs command +to return the cursor to the PO file window, if you want command +<KBD>,</KBD> for the next string, say. + +</P> +<P> +The <KBD>M-.</KBD> command has a few built-in speedups, so you do not +have to explicitly type all keywords all the time. The first such +speedup is that you are presented with a <EM>preferred</EM> keyword, +which you may accept by merely typing <KBD><KBD>RET</KBD></KBD> at the prompt. +The second speedup is that you may type any non-ambiguous prefix of the +keyword you really mean, and the command will complete it automatically +for you. This also means that PO mode has to <EM>know</EM> all +your possible keywords, and that it will not accept mistyped keywords. + +</P> +<P> +If you reply <KBD>?</KBD> to the keyword request, the command gives a +list of all known keywords, from which you may choose. When the +command is prefixed by an argument (<KBD>C-u M-.</KBD>), it inhibits +updating any program source or PO file buffer, and does some simple +keyword management instead. In this case, the command asks for a +keyword, written in full, which becomes a new allowed keyword for +later <KBD>M-.</KBD> commands. Moreover, this new keyword automatically +becomes the <EM>preferred</EM> keyword for later commands. By typing +an already known keyword in response to <KBD>C-u M-.</KBD>, one merely +changes the <EM>preferred</EM> keyword and does nothing more. + +</P> +<P> +All keywords known for <KBD>M-.</KBD> are recognized by the <KBD>,</KBD> command +when scanning for strings, and strings already marked by any of those +known keywords are automatically skipped. If many PO files are opened +simultaneously, each one has its own independent set of known keywords. +There is no provision in PO mode, currently, for deleting a known +keyword, you have to quit the file (maybe using <KBD>q</KBD>) and reopen +it afresh. When a PO file is newly brought up in an Emacs window, only +<SAMP>`gettext'</SAMP> and <SAMP>`_'</SAMP> are known as keywords, and <SAMP>`gettext'</SAMP> +is preferred for the <KBD>M-.</KBD> command. In fact, this is not useful to +prefer <SAMP>`_'</SAMP>, as this one is already built in the <KBD>M-,</KBD> command. + +</P> + + +<H2><A NAME="SEC17" HREF="gettext_toc.html#TOC17">3.4 Special Comments preceding Keywords</A></H2> + +<P> +In C programs strings are often used within calls of functions from the +<CODE>printf</CODE> family. The special thing about these format strings is +that they can contain format specifiers introduced with <KBD>%</KBD>. Assume +we have the code + +</P> + +<PRE> +printf (gettext ("String `%s' has %d characters\n"), s, strlen (s)); +</PRE> + +<P> +A possible German translation for the above string might be: + +</P> + +<PRE> +"%d Zeichen lang ist die Zeichenkette `%s'" +</PRE> + +<P> +A C programmer, even if he cannot speak German, will recognize that +there is something wrong here. The order of the two format specifiers +is changed but of course the arguments in the <CODE>printf</CODE> don't have. +This will most probably lead to problems because now the length of the +string is regarded as the address. + +</P> +<P> +To prevent errors at runtime caused by translations the <CODE>msgfmt</CODE> +tool can check statically whether the arguments in the original and the +translation string match in type and number. If this is not the case a +warning will be given and the error cannot causes problems at runtime. + +</P> +<P> +If the word order in the above German translation would be correct one +would have to write + +</P> + +<PRE> +"%2$d Zeichen lang ist die Zeichenkette `%1$s'" +</PRE> + +<P> +The routines in <CODE>msgfmt</CODE> know about this special notation. + +</P> +<P> +Because not all strings in a program must be format strings it is not +useful for <CODE>msgfmt</CODE> to test all the strings in the <TT>`.po'</TT> file. +This might cause problems because the string might contain what looks +like a format specifier, but the string is not used in <CODE>printf</CODE>. + +</P> +<P> +Therefore the <CODE>xgettext</CODE> adds a special tag to those messages it +thinks might be a format string. There is no absolute rule for this, +only a heuristic. In the <TT>`.po'</TT> file the entry is marked using the +<CODE>c-format</CODE> flag in the <KBD>#,</KBD> comment line (see section <A HREF="gettext_2.html#SEC9">2.2 The Format of PO Files</A>). + +</P> +<P> +The careful reader now might say that this again can cause problems. +The heuristic might guess it wrong. This is true and therefore +<CODE>xgettext</CODE> knows about special kind of comment which lets +the programmer take over the decision. If in the same line or +the immediately preceding line of the <CODE>gettext</CODE> keyword +the <CODE>xgettext</CODE> program find a comment containing the words +<KBD>xgettext:c-format</KBD> it will mark the string in any case with +the <KBD>c-format</KBD> flag. This kind of comment should be used when +<CODE>xgettext</CODE> does not recognize the string as a format string but +is really is one and it should be tested. Please note that when the +comment is in the same line of the <CODE>gettext</CODE> keyword, it must be +before the string to be translated. + +</P> +<P> +This situation happens quite often. The <CODE>printf</CODE> function is often +called with strings which do not contain a format specifier. Of course +one would normally use <CODE>fputs</CODE> but it does happen. In this case +<CODE>xgettext</CODE> does not recognize this as a format string but what +happens if the translation introduces a valid format specifier? The +<CODE>printf</CODE> function will try to access one of the parameter but none +exists because the original code does not refer to any parameter. + +</P> +<P> +<CODE>xgettext</CODE> of course could make a wrong decision the other way +round, i.e. a string marked as a format string actually is not a format +string. In this case the <CODE>msgfmt</CODE> might give too many warnings and +would prevent translating the <TT>`.po'</TT> file. The method to prevent +this wrong decision is similar to the one used above, only the comment +to use must contain the string <KBD>xgettext:no-c-format</KBD>. + +</P> +<P> +If a string is marked with <KBD>c-format</KBD> and this is not correct the +user can find out who is responsible for the decision. See +section <A HREF="gettext_4.html#SEC20">4.1 Invoking the <CODE>xgettext</CODE> Program</A> to see how the <KBD>--debug</KBD> option can be +used for solving this problem. + +</P> + + +<H2><A NAME="SEC18" HREF="gettext_toc.html#TOC18">3.5 Special Cases of Translatable Strings</A></H2> + +<P> +The attentive reader might now point out that it is not always possible +to mark translatable string with <CODE>gettext</CODE> or something like this. +Consider the following case: + +</P> + +<PRE> +{ + static const char *messages[] = { + "some very meaningful message", + "and another one" + }; + const char *string; + ... + string + = index > 1 ? "a default message" : messages[index]; + + fputs (string); + ... +} +</PRE> + +<P> +While it is no problem to mark the string <CODE>"a default message"</CODE> it +is not possible to mark the string initializers for <CODE>messages</CODE>. +What is to be done? We have to fulfill two tasks. First we have to mark the +strings so that the <CODE>xgettext</CODE> program (see section <A HREF="gettext_4.html#SEC20">4.1 Invoking the <CODE>xgettext</CODE> Program</A>) +can find them, and second we have to translate the string at runtime +before printing them. + +</P> +<P> +The first task can be fulfilled by creating a new keyword, which names a +no-op. For the second we have to mark all access points to a string +from the array. So one solution can look like this: + +</P> + +<PRE> +#define gettext_noop(String) (String) + +{ + static const char *messages[] = { + gettext_noop ("some very meaningful message"), + gettext_noop ("and another one") + }; + const char *string; + ... + string + = index > 1 ? gettext ("a default message") : gettext (messages[index]); + + fputs (string); + ... +} +</PRE> + +<P> +Please convince yourself that the string which is written by +<CODE>fputs</CODE> is translated in any case. How to get <CODE>xgettext</CODE> know +the additional keyword <CODE>gettext_noop</CODE> is explained in section <A HREF="gettext_4.html#SEC20">4.1 Invoking the <CODE>xgettext</CODE> Program</A>. + +</P> +<P> +The above is of course not the only solution. You could also come along +with the following one: + +</P> + +<PRE> +#define gettext_noop(String) (String) + +{ + static const char *messages[] = { + gettext_noop ("some very meaningful message", + gettext_noop ("and another one") + }; + const char *string; + ... + string + = index > 1 ? gettext_noop ("a default message") : messages[index]; + + fputs (gettext (string)); + ... +} +</PRE> + +<P> +But this has some drawbacks. First the programmer has to take care that +he uses <CODE>gettext_noop</CODE> for the string <CODE>"a default message"</CODE>. +A use of <CODE>gettext</CODE> could have in rare cases unpredictable results. +The second reason is found in the internals of the GNU <CODE>gettext</CODE> +Library which will make this solution less efficient. + +</P> +<P> +One advantage is that you need not make control flow analysis to make +sure the output is really translated in any case. But this analysis is +generally not very difficult. If it should be in any situation you can +use this second method in this situation. + +</P> +<P><HR><P> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_2.html">previous</A>, <A HREF="gettext_4.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +</BODY> +</HTML> diff --git a/doc/gettext_4.html b/doc/gettext_4.html new file mode 100644 index 0000000..daf1fc2 --- /dev/null +++ b/doc/gettext_4.html @@ -0,0 +1,208 @@ +<HTML> +<HEAD> +<!-- This HTML file has been created by texi2html 1.51 + from gettext.texi on 19 April 2001 --> + +<TITLE>GNU gettext utilities - 4 Making the PO Template File</TITLE> +</HEAD> +<BODY> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_3.html">previous</A>, <A HREF="gettext_5.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +<P><HR><P> + + +<H1><A NAME="SEC19" HREF="gettext_toc.html#TOC19">4 Making the PO Template File</A></H1> + +<P> +After preparing the sources, the programmer creates a PO template file. +This section explains how to use <CODE>xgettext</CODE> for this purpose. + +</P> + + + +<H2><A NAME="SEC20" HREF="gettext_toc.html#TOC20">4.1 Invoking the <CODE>xgettext</CODE> Program</A></H2> + + +<PRE> +xgettext [<VAR>option</VAR>] <VAR>inputfile</VAR> ... +</PRE> + +<DL COMPACT> + +<DT><SAMP>`-a'</SAMP> +<DD> +<DT><SAMP>`--extract-all'</SAMP> +<DD> +Extract all strings. + +<DT><SAMP>`-c [<VAR>tag</VAR>]'</SAMP> +<DD> +<DT><SAMP>`--add-comments[=<VAR>tag</VAR>]'</SAMP> +<DD> +Place comment block with <VAR>tag</VAR> (or those preceding keyword lines) +in output file. + +<DT><SAMP>`-C'</SAMP> +<DD> +<DT><SAMP>`--c++'</SAMP> +<DD> +Recognize C++ style comments. + +<DT><SAMP>`--debug'</SAMP> +<DD> +Use the flags <KBD>c-format</KBD> and <KBD>possible-c-format</KBD> to show who was +responsible for marking a message as a format string. The latter form is +used if the <CODE>xgettext</CODE> program decided, the format form is used if +the programmer prescribed it. + +By default only the <KBD>c-format</KBD> form is used. The translator should +not have to care about these details. + +<DT><SAMP>`-d <VAR>name</VAR>'</SAMP> +<DD> +<DT><SAMP>`--default-domain=<VAR>name</VAR>'</SAMP> +<DD> +Use <TT>`<VAR>name</VAR>.po'</TT> for output (instead of <TT>`messages.po'</TT>). + +The special domain name <TT>`-'</TT> or <TT>`/dev/stdout'</TT> means to write +the output to <TT>`stdout'</TT>. + +<DT><SAMP>`-D <VAR>directory</VAR>'</SAMP> +<DD> +<DT><SAMP>`--directory=<VAR>directory</VAR>'</SAMP> +<DD> +Change to <VAR>directory</VAR> before beginning to search and scan source +files. The resulting <TT>`.po'</TT> file will be written relative to the +original directory, though. + +<DT><SAMP>`-f <VAR>file</VAR>'</SAMP> +<DD> +<DT><SAMP>`--files-from=<VAR>file</VAR>'</SAMP> +<DD> +Read the names of the input files from <VAR>file</VAR> instead of getting +them from the command line. + +<DT><SAMP>`--force'</SAMP> +<DD> +Always write an output file even if no message is defined. + +<DT><SAMP>`-h'</SAMP> +<DD> +<DT><SAMP>`--help'</SAMP> +<DD> +Display this help and exit. + +<DT><SAMP>`-I <VAR>list</VAR>'</SAMP> +<DD> +<DT><SAMP>`--input-path=<VAR>list</VAR>'</SAMP> +<DD> +List of directories searched for input files. + +<DT><SAMP>`-j'</SAMP> +<DD> +<DT><SAMP>`--join-existing'</SAMP> +<DD> +Join messages with existing file. + +<DT><SAMP>`-k <VAR>word</VAR>'</SAMP> +<DD> +<DT><SAMP>`--keyword[=<VAR>keywordspec</VAR>]'</SAMP> +<DD> +Additional keyword to be looked for (without <VAR>keywordspec</VAR> means not to +use default keywords). + +If <VAR>keywordspec</VAR> is a C identifer <VAR>id</VAR>, <CODE>xgettext</CODE> looks +for strings in the first argument of each call to the function or macro +<VAR>id</VAR>. If <VAR>keywordspec</VAR> is of the form +<SAMP>`<VAR>id</VAR>:<VAR>argnum</VAR>'</SAMP>, <CODE>xgettext</CODE> looks for strings in the +<VAR>argnum</VAR>th argument of the call. If <VAR>keywordspec</VAR> is of the form +<SAMP>`<VAR>id</VAR>:<VAR>argnum1</VAR>,<VAR>argnum2</VAR>'</SAMP>, <CODE>xgettext</CODE> looks for +strings in the <VAR>argnum1</VAR>st argument and in the <VAR>argnum2</VAR>nd argument +of the call, and treats them as singular/plural variants for a message +with plural handling. + +The default keyword specifications, which are always looked for if not +explicitly disabled, are <CODE>gettext</CODE>, <CODE>dgettext:2</CODE>, +<CODE>dcgettext:2</CODE>, <CODE>ngettext:1,2</CODE>, <CODE>dngettext:2,3</CODE>, +<CODE>dcngettext:2,3</CODE>, and <CODE>gettext_noop</CODE>. + +<DT><SAMP>`-m [<VAR>string</VAR>]'</SAMP> +<DD> +<DT><SAMP>`--msgstr-prefix[=<VAR>string</VAR>]'</SAMP> +<DD> +Use <VAR>string</VAR> or "" as prefix for msgstr entries. + +<DT><SAMP>`-M [<VAR>string</VAR>]'</SAMP> +<DD> +<DT><SAMP>`--msgstr-suffix[=<VAR>string</VAR>]'</SAMP> +<DD> +Use <VAR>string</VAR> or "" as suffix for msgstr entries. + +<DT><SAMP>`--no-location'</SAMP> +<DD> +Do not write <SAMP>`#: <VAR>filename</VAR>:<VAR>line</VAR>'</SAMP> lines. + +<DT><SAMP>`-n'</SAMP> +<DD> +<DT><SAMP>`--add-location'</SAMP> +<DD> +Generate <SAMP>`#: <VAR>filename</VAR>:<VAR>line</VAR>'</SAMP> lines (default). + +<DT><SAMP>`--omit-header'</SAMP> +<DD> +Don't write header with <SAMP>`msgid ""'</SAMP> entry. + +This is useful for testing purposes because it eliminates a source +of variance for generated <CODE>.gmo</CODE> files. We can ship some of +these files in the GNU <CODE>gettext</CODE> package, and the result of +regenerating them through <CODE>msgfmt</CODE> should yield the same values. + +<DT><SAMP>`-p <VAR>dir</VAR>'</SAMP> +<DD> +<DT><SAMP>`--output-dir=<VAR>dir</VAR>'</SAMP> +<DD> +Output files will be placed in directory <VAR>dir</VAR>. + +<DT><SAMP>`-s'</SAMP> +<DD> +<DT><SAMP>`--sort-output'</SAMP> +<DD> +Generate sorted output and remove duplicates. + +<DT><SAMP>`--strict'</SAMP> +<DD> +Write out a strict Uniforum conforming PO file. + +<DT><SAMP>`-v'</SAMP> +<DD> +<DT><SAMP>`--version'</SAMP> +<DD> +Output version information and exit. + +<DT><SAMP>`-x <VAR>file</VAR>'</SAMP> +<DD> +<DT><SAMP>`--exclude-file=<VAR>file</VAR>'</SAMP> +<DD> +Entries from <VAR>file</VAR> are not extracted. + +</DL> + +<P> +Search path for supplementary PO files is: +<TT>`/usr/local/share/nls/src/'</TT>. + +</P> +<P> +If <VAR>inputfile</VAR> is <SAMP>`-'</SAMP>, standard input is read. + +</P> +<P> +This implementation of <CODE>xgettext</CODE> is able to process a few awkward +cases, like strings in preprocessor macros, ANSI concatenation of +adjacent strings, and escaped end of lines for continued strings. + +</P> +<P><HR><P> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_3.html">previous</A>, <A HREF="gettext_5.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +</BODY> +</HTML> diff --git a/doc/gettext_5.html b/doc/gettext_5.html new file mode 100644 index 0000000..3723f1c --- /dev/null +++ b/doc/gettext_5.html @@ -0,0 +1,188 @@ +<HTML> +<HEAD> +<!-- This HTML file has been created by texi2html 1.51 + from gettext.texi on 19 April 2001 --> + +<TITLE>GNU gettext utilities - 5 Creating a New PO File</TITLE> +</HEAD> +<BODY> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_4.html">previous</A>, <A HREF="gettext_6.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +<P><HR><P> + + +<H1><A NAME="SEC21" HREF="gettext_toc.html#TOC21">5 Creating a New PO File</A></H1> + +<P> +When starting a new translation, the translator copies the +<TT>`<VAR>package</VAR>.pot'</TT> template file to a file called +<TT>`<VAR>LANG</VAR>.po'</TT>. Then she modifies the initial comments and +the header entry of this file. + +</P> +<P> +The initial comments "SOME DESCRIPTIVE TITLE", "YEAR" and +"FIRST AUTHOR <EMAIL@ADDRESS>, YEAR" ought to be replaced by sensible +information. This can be done in any text editor; if Emacs is used +and it switched to PO mode automatically (because it has recognized +the file's suffix), you can disable it by typing <KBD>M-x fundamental-mode</KBD>. + +</P> +<P> +Modifying the header entry can already be done using PO mode: in Emacs, +type <KBD>M-x po-mode RET</KBD> and then <KBD>RET</KBD> again to start editing the +entry. You should fill in the following fields. + +</P> +<DL COMPACT> + +<DT>Project-Id-Version +<DD> +This is the name and version of the package. + +<DT>POT-Creation-Date +<DD> +This has already been filled in by <CODE>xgettext</CODE>. + +<DT>PO-Revision-Date +<DD> +You don't need to fill this in. It will be filled by the Emacs PO mode +when you save the file. + +<DT>Last-Translator +<DD> +Fill in your name and email address (without double quotes). + +<DT>Language-Team +<DD> +Fill in the English name of the language, and the email address of the +language team you are part of. + +Before starting a translation, it is a good idea to get in touch with +your translation team, not only to make sure you don't do duplicated work, +but also to coordinate difficult linguistic issues. + +In the Free Translation Project, each translation team has its own mailing +list. The up-to-date list of teams can be found at the Free Translation +Project's homepage, <TT>`http://www.iro.umontreal.ca/contrib/po/HTML/'</TT>, +in the "National teams" area. + +<DT>Content-Type +<DD> +Replace <SAMP>`CHARSET'</SAMP> with the character encoding used for your language, +in your locale, or UTF-8. This field is needed for correct operation of the +<CODE>msgmerge</CODE> and <CODE>msgfmt</CODE> programs, as well as for users whose +locale's character encoding differs from yours (see section <A HREF="gettext_9.html#SEC49">9.2.4 How to specify the output character set <CODE>gettext</CODE> uses</A>). + +You get the character encoding of your locale by running the shell command +<SAMP>`locale charmap'</SAMP>. If the result is <SAMP>`C'</SAMP> or <SAMP>`ANSI_X3.4-1968'</SAMP>, +which is equivalent to <SAMP>`ASCII'</SAMP> (= <SAMP>`US-ASCII'</SAMP>), it means that your +locale is not correctly configured. In this case, ask your translation +team which charset to use. <SAMP>`ASCII'</SAMP> is not usable for any language +except Latin. + +Because the PO files must be portable to operating systems with less advanced +internationalization facilities, the character encodings that can be used +are limited to those supported by both GNU <CODE>libc</CODE> and GNU +<CODE>libiconv</CODE>. These are: +<CODE>ASCII</CODE>, <CODE>ISO-8859-1</CODE>, <CODE>ISO-8859-2</CODE>, <CODE>ISO-8859-3</CODE>, +<CODE>ISO-8859-4</CODE>, <CODE>ISO-8859-5</CODE>, <CODE>ISO-8859-6</CODE>, <CODE>ISO-8859-7</CODE>, +<CODE>ISO-8859-8</CODE>, <CODE>ISO-8859-9</CODE>, <CODE>ISO-8859-13</CODE>, <CODE>ISO-8859-15</CODE>, +<CODE>KOI8-R</CODE>, <CODE>KOI8-U</CODE>, <CODE>CP850</CODE>, <CODE>CP866</CODE>, <CODE>CP874</CODE>, +<CODE>CP932</CODE>, <CODE>CP949</CODE>, <CODE>CP950</CODE>, <CODE>CP1250</CODE>, <CODE>CP1251</CODE>, +<CODE>CP1252</CODE>, <CODE>CP1253</CODE>, <CODE>CP1254</CODE>, <CODE>CP1255</CODE>, <CODE>CP1256</CODE>, +<CODE>CP1257</CODE>, <CODE>GB2312</CODE>, <CODE>EUC-JP</CODE>, <CODE>EUC-KR</CODE>, <CODE>EUC-TW</CODE>, +<CODE>BIG5</CODE>, <CODE>BIG5HKSCS</CODE>, <CODE>GBK</CODE>, <CODE>GB18030</CODE>, <CODE>SJIS</CODE>, +<CODE>JOHAB</CODE>, <CODE>TIS-620</CODE>, <CODE>VISCII</CODE>, <CODE>UTF-8</CODE>. + +In the GNU system, the following encodings are frequently used for the +corresponding languages. + + +<UL> +<LI><CODE>ISO-8859-1</CODE> for + + Afrikaans, Albanian, Basque, Catalan, Dutch, English, Estonian, Faroese, + Finnish, French, Galician, German, Greenlandic, Icelandic, Indonesian, + Irish, Italian, Malay, Norwegian, Portuguese, Spanish, Swedish, +<LI><CODE>ISO-8859-2</CODE> for + + Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak, Slovenian, +<LI><CODE>ISO-8859-3</CODE> for Maltese, + +<LI><CODE>ISO-8859-5</CODE> for Macedonian, Serbian, + +<LI><CODE>ISO-8859-6</CODE> for Arabic, + +<LI><CODE>ISO-8859-7</CODE> for Greek, + +<LI><CODE>ISO-8859-8</CODE> for Hebrew, + +<LI><CODE>ISO-8859-9</CODE> for Turkish, + +<LI><CODE>ISO-8859-13</CODE> for Latvian, Lithuanian, + +<LI><CODE>ISO-8859-15</CODE> for + + Basque, Catalan, Dutch, English, Finnish, French, Galician, German, Irish, + Italian, Portuguese, Spanish, Swedish, +<LI><CODE>KOI8-R</CODE> for Russian, + +<LI><CODE>KOI8-U</CODE> for Ukrainian, + +<LI><CODE>CP1251</CODE> for Bulgarian, Byelorussian, + +<LI><CODE>GB2312</CODE>, <CODE>GBK</CODE>, <CODE>GB18030</CODE> + + for simplified writing of Chinese, +<LI><CODE>BIG5</CODE>, <CODE>BIG5HKSCS</CODE> + + for traditional writing of Chinese, +<LI><CODE>EUC-JP</CODE> for Japanese, + +<LI><CODE>EUC-KR</CODE> for Korean, + +<LI><CODE>TIS-620</CODE> for Thai, + +<LI><CODE>UTF-8</CODE> for any language, including those listed above. + +</UL> + +When single quote characters or double quote characters are used in +translations for your language, and your locale's encoding is one of the +ISO-8859-* charsets, it is best if you create your PO files in UTF-8 +encoding, instead of your locale's encoding. This is because in UTF-8 +the real quote characters can be represented (single quote characters: +U+2018, U+2019, double quote characters: U+201C, U+201D), whereas none of +ISO-8859-* charsets has them all. Users in UTF-8 locales will see the +real quote characters, whereas users in ISO-8859-* locales will see the +vertical apostrophe and the vertical double quote instead (because that's +what the character set conversion will transliterate them to). + +To enter such quote characters under X11, you can change your keyboard +mapping using the <CODE>xmodmap</CODE> program. The X11 names of the quote +characters are "leftsinglequotemark", "rightsinglequotemark", +"leftdoublequotemark", "rightdoublequotemark", "singlelowquotemark", +"doublelowquotemark". + +Note that only recent versions of GNU Emacs support the UTF-8 encoding: +Emacs 20 with Mule-UCS, and Emacs 21. As of January 2001, XEmacs doesn't +support the UTF-8 encoding. + +The character encoding name can be written in either upper or lower case. +Usually upper case is preferred. + +<DT>Content-Transfer-Encoding +<DD> +Set this to <CODE>8-bit</CODE>. + +<DT>Plural-Forms +<DD> +This field is optional. It is only needed if the PO file has plural forms. +You can find them by searching for the <SAMP>`msgid_plural'</SAMP> keyword. The +format of the plural forms field is described in section <A HREF="gettext_9.html#SEC50">9.2.5 Additional functions for plural forms</A>. +</DL> + +<P><HR><P> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_4.html">previous</A>, <A HREF="gettext_6.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +</BODY> +</HTML> diff --git a/doc/gettext_6.html b/doc/gettext_6.html new file mode 100644 index 0000000..363d567 --- /dev/null +++ b/doc/gettext_6.html @@ -0,0 +1,930 @@ +<HTML> +<HEAD> +<!-- This HTML file has been created by texi2html 1.51 + from gettext.texi on 19 April 2001 --> + +<TITLE>GNU gettext utilities - 6 Updating Existing PO Files</TITLE> +</HEAD> +<BODY> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_5.html">previous</A>, <A HREF="gettext_7.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +<P><HR><P> + + +<H1><A NAME="SEC22" HREF="gettext_toc.html#TOC22">6 Updating Existing PO Files</A></H1> + + + +<H2><A NAME="SEC23" HREF="gettext_toc.html#TOC23">6.1 Invoking the <CODE>msgmerge</CODE> Program</A></H2> + + + +<H2><A NAME="SEC24" HREF="gettext_toc.html#TOC24">6.2 Translated Entries</A></H2> + +<P> +Each PO file entry for which the <CODE>msgstr</CODE> field has been filled with +a translation, and which is not marked as fuzzy (see section <A HREF="gettext_6.html#SEC25">6.3 Fuzzy Entries</A>), +is a said to be a <STRONG>translated</STRONG> entry. Only translated entries will +later be compiled by GNU <CODE>msgfmt</CODE> and become usable in programs. +Other entry types will be excluded; translation will not occur for them. + +</P> +<P> +Some commands are more specifically related to translated entry processing. + +</P> +<DL COMPACT> + +<DT><KBD>t</KBD> +<DD> +Find the next translated entry. + +<DT><KBD>M-t</KBD> +<DD> +Find the previous translated entry. + +</DL> + +<P> +The commands <KBD>t</KBD> (<CODE>po-next-translated-entry</CODE>) and <KBD>M-t</KBD> +(<CODE>po-previous-transted-entry</CODE>) move forwards or backwards, chasing +for an translated entry. If none is found, the search is extended and +wraps around in the PO file buffer. + +</P> +<P> +Translated entries usually result from the translator having edited in +a translation for them, section <A HREF="gettext_6.html#SEC28">6.6 Modifying Translations</A>. However, if the +variable <CODE>po-auto-fuzzy-on-edit</CODE> is not <CODE>nil</CODE>, the entry having +received a new translation first becomes a fuzzy entry, which ought to +be later unfuzzied before becoming an official, genuine translated entry. +See section <A HREF="gettext_6.html#SEC25">6.3 Fuzzy Entries</A>. + +</P> + + +<H2><A NAME="SEC25" HREF="gettext_toc.html#TOC25">6.3 Fuzzy Entries</A></H2> + +<P> +Each PO file entry may have a set of <STRONG>attributes</STRONG>, which are +qualities given a name and explicitely associated with the translation, +using a special system comment. One of these attributes +has the name <CODE>fuzzy</CODE>, and entries having this attribute are said +to have a fuzzy translation. They are called fuzzy entries, for short. + +</P> +<P> +Fuzzy entries, even if they account for translated entries for +most other purposes, usually call for revision by the translator. +Those may be produced by applying the program <CODE>msgmerge</CODE> to +update an older translated PO files according to a new PO template +file, when this tool hypothesises that some new <CODE>msgid</CODE> has +been modified only slightly out of an older one, and chooses to pair +what it thinks to be the old translation for the new modified entry. +The slight alteration in the original string (the <CODE>msgid</CODE> string) +should often be reflected in the translated string, and this requires +the intervention of the translator. For this reason, <CODE>msgmerge</CODE> +might mark some entries as being fuzzy. + +</P> +<P> +Also, the translator may decide herself to mark an entry as fuzzy +for her own convenience, when she wants to remember that the entry +has to be later revisited. So, some commands are more specifically +related to fuzzy entry processing. + +</P> +<DL COMPACT> + +<DT><KBD>f</KBD> +<DD> +Find the next fuzzy entry. + +<DT><KBD>M-f</KBD> +<DD> +Find the previous fuzzy entry. + +<DT><KBD><KBD>TAB</KBD></KBD> +<DD> +Remove the fuzzy attribute of the current entry. + +</DL> + +<P> +The commands <KBD>f</KBD> (<CODE>po-next-fuzzy</CODE>) and <KBD>M-f</KBD> +(<CODE>po-previous-fuzzy</CODE>) move forwards or backwards, chasing for +a fuzzy entry. If none is found, the search is extended and wraps +around in the PO file buffer. + +</P> +<P> +The command <KBD><KBD>TAB</KBD></KBD> (<CODE>po-unfuzzy</CODE>) removes the fuzzy +attribute associated with an entry, usually leaving it translated. +Further, if the variable <CODE>po-auto-select-on-unfuzzy</CODE> has not +the <CODE>nil</CODE> value, the <KBD><KBD>TAB</KBD></KBD> command will automatically chase +for another interesting entry to work on. The initial value of +<CODE>po-auto-select-on-unfuzzy</CODE> is <CODE>nil</CODE>. + +</P> +<P> +The initial value of <CODE>po-auto-fuzzy-on-edit</CODE> is <CODE>nil</CODE>. However, +if the variable <CODE>po-auto-fuzzy-on-edit</CODE> is set to <CODE>t</CODE>, any entry +edited through the <KBD><KBD>RET</KBD></KBD> command is marked fuzzy, as a way to +ensure some kind of double check, later. In this case, the usual paradigm +is that an entry becomes fuzzy (if not already) whenever the translator +modifies it. If she is satisfied with the translation, she then uses +<KBD><KBD>TAB</KBD></KBD> to pick another entry to work on, clearing the fuzzy attribute +on the same blow. If she is not satisfied yet, she merely uses <KBD><KBD>SPC</KBD></KBD> +to chase another entry, leaving the entry fuzzy. + +</P> +<P> +The translator may also use the <KBD><KBD>DEL</KBD></KBD> command +(<CODE>po-fade-out-entry</CODE>) over any translated entry to mark it as being +fuzzy, when she wants to easily leave a trace she wants to later return +working at this entry. + +</P> +<P> +Also, when time comes to quit working on a PO file buffer with the <KBD>q</KBD> +command, the translator is asked for confirmation, if fuzzy string +still exists. + +</P> + + +<H2><A NAME="SEC26" HREF="gettext_toc.html#TOC26">6.4 Untranslated Entries</A></H2> + +<P> +When <CODE>xgettext</CODE> originally creates a PO file, unless told +otherwise, it initializes the <CODE>msgid</CODE> field with the untranslated +string, and leaves the <CODE>msgstr</CODE> string to be empty. Such entries, +having an empty translation, are said to be <STRONG>untranslated</STRONG> entries. +Later, when the programmer slightly modifies some string right in +the program, this change is later reflected in the PO file +by the appearance of a new untranslated entry for the modified string. + +</P> +<P> +The usual commands moving from entry to entry consider untranslated +entries on the same level as active entries. Untranslated entries +are easily recognizable by the fact they end with <SAMP>`msgstr ""'</SAMP>. + +</P> +<P> +The work of the translator might be (quite naively) seen as the process +of seeking for an untranslated entry, editing a translation for +it, and repeating these actions until no untranslated entries remain. +Some commands are more specifically related to untranslated entry +processing. + +</P> +<DL COMPACT> + +<DT><KBD>u</KBD> +<DD> +Find the next untranslated entry. + +<DT><KBD>M-u</KBD> +<DD> +Find the previous untranslated entry. + +<DT><KBD>k</KBD> +<DD> +Turn the current entry into an untranslated one. + +</DL> + +<P> +The commands <KBD>u</KBD> (<CODE>po-next-untranslated-entry</CODE>) and <KBD>M-u</KBD> +(<CODE>po-previous-untransted-entry</CODE>) move forwards or backwards, +chasing for an untranslated entry. If none is found, the search is +extended and wraps around in the PO file buffer. + +</P> +<P> +An entry can be turned back into an untranslated entry by +merely emptying its translation, using the command <KBD>k</KBD> +(<CODE>po-kill-msgstr</CODE>). See section <A HREF="gettext_6.html#SEC28">6.6 Modifying Translations</A>. + +</P> +<P> +Also, when time comes to quit working on a PO file buffer +with the <KBD>q</KBD> command, the translator is asked for confirmation, +if some untranslated string still exists. + +</P> + + +<H2><A NAME="SEC27" HREF="gettext_toc.html#TOC27">6.5 Obsolete Entries</A></H2> + +<P> +By <STRONG>obsolete</STRONG> PO file entries, we mean those entries which are +commented out, usually by <CODE>msgmerge</CODE> when it found that the +translation is not needed anymore by the package being localized. + +</P> +<P> +The usual commands moving from entry to entry consider obsolete +entries on the same level as active entries. Obsolete entries are +easily recognizable by the fact that all their lines start with +<KBD>#</KBD>, even those lines containing <CODE>msgid</CODE> or <CODE>msgstr</CODE>. + +</P> +<P> +Commands exist for emptying the translation or reinitializing it +to the original untranslated string. Commands interfacing with the +kill ring may force some previously saved text into the translation. +The user may interactively edit the translation. All these commands +may apply to obsolete entries, carefully leaving the entry obsolete +after the fact. + +</P> +<P> +Moreover, some commands are more specifically related to obsolete +entry processing. + +</P> +<DL COMPACT> + +<DT><KBD>o</KBD> +<DD> +Find the next obsolete entry. + +<DT><KBD>M-o</KBD> +<DD> +Find the previous obsolete entry. + +<DT><KBD><KBD>DEL</KBD></KBD> +<DD> +Make an active entry obsolete, or zap out an obsolete entry. + +</DL> + +<P> +The commands <KBD>o</KBD> (<CODE>po-next-obsolete-entry</CODE>) and <KBD>M-o</KBD> +(<CODE>po-previous-obsolete-entry</CODE>) move forwards or backwards, +chasing for an obsolete entry. If none is found, the search is +extended and wraps around in the PO file buffer. + +</P> +<P> +PO mode does not provide ways for un-commenting an obsolete entry +and making it active, because this would reintroduce an original +untranslated string which does not correspond to any marked string +in the program sources. This goes with the philosophy of never +introducing useless <CODE>msgid</CODE> values. + +</P> +<P> +However, it is possible to comment out an active entry, so making +it obsolete. GNU <CODE>gettext</CODE> utilities will later react to the +disappearance of a translation by using the untranslated string. +The command <KBD><KBD>DEL</KBD></KBD> (<CODE>po-fade-out-entry</CODE>) pushes the current entry +a little further towards annihilation. If the entry is active (it is a +translated entry), then it is first made fuzzy. If it is already fuzzy, +then the entry is merely commented out, with confirmation. If the entry +is already obsolete, then it is completely deleted from the PO file. +It is easy to recycle the translation so deleted into some other PO file +entry, usually one which is untranslated. See section <A HREF="gettext_6.html#SEC28">6.6 Modifying Translations</A>. + +</P> +<P> +Here is a quite interesting problem to solve for later development of +PO mode, for those nights you are not sleepy. The idea would be that +PO mode might become bright enough, one of these days, to make good +guesses at retrieving the most probable candidate, among all obsolete +entries, for initializing the translation of a newly appeared string. +I think it might be a quite hard problem to do this algorithmically, as +we have to develop good and efficient measures of string similarity. +Right now, PO mode completely lets the decision to the translator, +when the time comes to find the adequate obsolete translation, it +merely tries to provide handy tools for helping her to do so. + +</P> + + +<H2><A NAME="SEC28" HREF="gettext_toc.html#TOC28">6.6 Modifying Translations</A></H2> + +<P> +PO mode prevents direct edition of the PO file, by the usual +means Emacs give for altering a buffer's contents. By doing so, +it pretends helping the translator to avoid little clerical errors +about the overall file format, or the proper quoting of strings, +as those errors would be easily made. Other kinds of errors are +still possible, but some may be caught and diagnosed by the batch +validation process, which the translator may always trigger by the +<KBD>V</KBD> command. For all other errors, the translator has to rely on +her own judgment, and also on the linguistic reports submitted to her +by the users of the translated package, having the same mother tongue. + +</P> +<P> +When the time comes to create a translation, correct an error diagnosed +mechanically or reported by a user, the translators have to resort to +using the following commands for modifying the translations. + +</P> +<DL COMPACT> + +<DT><KBD><KBD>RET</KBD></KBD> +<DD> +Interactively edit the translation. + +<DT><KBD><KBD>LFD</KBD></KBD> +<DD> +Reinitialize the translation with the original, untranslated string. + +<DT><KBD>k</KBD> +<DD> +Save the translation on the kill ring, and delete it. + +<DT><KBD>w</KBD> +<DD> +Save the translation on the kill ring, without deleting it. + +<DT><KBD>y</KBD> +<DD> +Replace the translation, taking the new from the kill ring. + +</DL> + +<P> +The command <KBD><KBD>RET</KBD></KBD> (<CODE>po-edit-msgstr</CODE>) opens a new Emacs +window meant to edit in a new translation, or to modify an already existing +translation. The new window contains a copy of the translation taken from +the current PO file entry, all ready for edition, expunged of all quoting +marks, fully modifiable and with the complete extent of Emacs modifying +commands. When the translator is done with her modifications, she may use +<KBD>C-c C-c</KBD> to close the subedit window with the automatically requoted +results, or <KBD>C-c C-k</KBD> to abort her modifications. See section <A HREF="gettext_6.html#SEC30">6.8 Details of Sub Edition</A>, +for more information. + +</P> +<P> +The command <KBD><KBD>LFD</KBD></KBD> (<CODE>po-msgid-to-msgstr</CODE>) initializes, or +reinitializes the translation with the original string. This command is +normally used when the translator wants to redo a fresh translation of +the original string, disregarding any previous work. + +</P> +<P> +It is possible to arrange so, whenever editing an untranslated +entry, the <KBD><KBD>LFD</KBD></KBD> command be automatically executed. If you set +<CODE>po-auto-edit-with-msgid</CODE> to <CODE>t</CODE>, the translation gets +initialised with the original string, in case none exists already. +The default value for <CODE>po-auto-edit-with-msgid</CODE> is <CODE>nil</CODE>. + +</P> +<P> +In fact, whether it is best to start a translation with an empty +string, or rather with a copy of the original string, is a matter of +taste or habit. Sometimes, the source language and the +target language are so different that is simply best to start writing +on an empty page. At other times, the source and target languages +are so close that it would be a waste to retype a number of words +already being written in the original string. A translator may also +like having the original string right under her eyes, as she will +progressively overwrite the original text with the translation, even +if this requires some extra editing work to get rid of the original. + +</P> +<P> +The command <KBD>k</KBD> (<CODE>po-kill-msgstr</CODE>) merely empties the +translation string, so turning the entry into an untranslated +one. But while doing so, its previous contents is put apart in +a special place, known as the kill ring. The command <KBD>w</KBD> +(<CODE>po-kill-ring-save-msgstr</CODE>) has also the effect of taking a +copy of the translation onto the kill ring, but it otherwise leaves +the entry alone, and does <EM>not</EM> remove the translation from the +entry. Both commands use exactly the Emacs kill ring, which is shared +between buffers, and which is well known already to Emacs lovers. + +</P> +<P> +The translator may use <KBD>k</KBD> or <KBD>w</KBD> many times in the course +of her work, as the kill ring may hold several saved translations. +From the kill ring, strings may later be reinserted in various +Emacs buffers. In particular, the kill ring may be used for moving +translation strings between different entries of a single PO file +buffer, or if the translator is handling many such buffers at once, +even between PO files. + +</P> +<P> +To facilitate exchanges with buffers which are not in PO mode, the +translation string put on the kill ring by the <KBD>k</KBD> command is fully +unquoted before being saved: external quotes are removed, multi-line +strings are concatenated, and backslash escaped sequences are turned +into their corresponding characters. In the special case of obsolete +entries, the translation is also uncommented prior to saving. + +</P> +<P> +The command <KBD>y</KBD> (<CODE>po-yank-msgstr</CODE>) completely replaces the +translation of the current entry by a string taken from the kill ring. +Following Emacs terminology, we then say that the replacement +string is <STRONG>yanked</STRONG> into the PO file buffer. +See section `Yanking' in <CITE>The Emacs Editor</CITE>. +The first time <KBD>y</KBD> is used, the translation receives the value of +the most recent addition to the kill ring. If <KBD>y</KBD> is typed once +again, immediately, without intervening keystrokes, the translation +just inserted is taken away and replaced by the second most recent +addition to the kill ring. By repeating <KBD>y</KBD> many times in a row, +the translator may travel along the kill ring for saved strings, +until she finds the string she really wanted. + +</P> +<P> +When a string is yanked into a PO file entry, it is fully and +automatically requoted for complying with the format PO files should +have. Further, if the entry is obsolete, PO mode then appropriately +push the inserted string inside comments. Once again, translators +should not burden themselves with quoting considerations besides, of +course, the necessity of the translated string itself respective to +the program using it. + +</P> +<P> +Note that <KBD>k</KBD> or <KBD>w</KBD> are not the only commands pushing strings +on the kill ring, as almost any PO mode command replacing translation +strings (or the translator comments) automatically saves the old string +on the kill ring. The main exceptions to this general rule are the +yanking commands themselves. + +</P> +<P> +To better illustrate the operation of killing and yanking, let's +use an actual example, taken from a common situation. When the +programmer slightly modifies some string right in the program, his +change is later reflected in the PO file by the appearance +of a new untranslated entry for the modified string, and the fact +that the entry translating the original or unmodified string becomes +obsolete. In many cases, the translator might spare herself some work +by retrieving the unmodified translation from the obsolete entry, +then initializing the untranslated entry <CODE>msgstr</CODE> field with +this retrieved translation. Once this done, the obsolete entry is +not wanted anymore, and may be safely deleted. + +</P> +<P> +When the translator finds an untranslated entry and suspects that a +slight variant of the translation exists, she immediately uses <KBD>m</KBD> +to mark the current entry location, then starts chasing obsolete +entries with <KBD>o</KBD>, hoping to find some translation corresponding +to the unmodified string. Once found, she uses the <KBD><KBD>DEL</KBD></KBD> command +for deleting the obsolete entry, knowing that <KBD><KBD>DEL</KBD></KBD> also <EM>kills</EM> +the translation, that is, pushes the translation on the kill ring. +Then, <KBD>r</KBD> returns to the initial untranslated entry, and <KBD>y</KBD> +then <EM>yanks</EM> the saved translation right into the <CODE>msgstr</CODE> +field. The translator is then free to use <KBD><KBD>RET</KBD></KBD> for fine +tuning the translation contents, and maybe to later use <KBD>u</KBD>, +then <KBD>m</KBD> again, for going on with the next untranslated string. + +</P> +<P> +When some sequence of keys has to be typed over and over again, the +translator may find it useful to become better acquainted with the Emacs +capability of learning these sequences and playing them back under request. +See section `Keyboard Macros' in <CITE>The Emacs Editor</CITE>. + +</P> + + +<H2><A NAME="SEC29" HREF="gettext_toc.html#TOC29">6.7 Modifying Comments</A></H2> + +<P> +Any translation work done seriously will raise many linguistic +difficulties, for which decisions have to be made, and the choices +further documented. These documents may be saved within the +PO file in form of translator comments, which the translator +is free to create, delete, or modify at will. These comments may +be useful to herself when she returns to this PO file after a while. + +</P> +<P> +Comments not having whitespace after the initial <SAMP>`#'</SAMP>, for example, +those beginning with <SAMP>`#.'</SAMP> or <SAMP>`#:'</SAMP>, are <EM>not</EM> translator +comments, they are exclusively created by other <CODE>gettext</CODE> tools. +So, the commands below will never alter such system added comments, +they are not meant for the translator to modify. See section <A HREF="gettext_2.html#SEC9">2.2 The Format of PO Files</A>. + +</P> +<P> +The following commands are somewhat similar to those modifying translations, +so the general indications given for those apply here. See section <A HREF="gettext_6.html#SEC28">6.6 Modifying Translations</A>. + +</P> +<DL COMPACT> + +<DT><KBD>#</KBD> +<DD> +Interactively edit the translator comments. + +<DT><KBD>K</KBD> +<DD> +Save the translator comments on the kill ring, and delete it. + +<DT><KBD>W</KBD> +<DD> +Save the translator comments on the kill ring, without deleting it. + +<DT><KBD>Y</KBD> +<DD> +Replace the translator comments, taking the new from the kill ring. + +</DL> + +<P> +These commands parallel PO mode commands for modifying the translation +strings, and behave much the same way as they do, except that they handle +this part of PO file comments meant for translator usage, rather +than the translation strings. So, if the descriptions given below are +slightly succinct, it is because the full details have already been given. +See section <A HREF="gettext_6.html#SEC28">6.6 Modifying Translations</A>. + +</P> +<P> +The command <KBD>#</KBD> (<CODE>po-edit-comment</CODE>) opens a new Emacs window +containing a copy of the translator comments on the current PO file entry. +If there are no such comments, PO mode understands that the translator wants +to add a comment to the entry, and she is presented with an empty screen. +Comment marks (<KBD>#</KBD>) and the space following them are automatically +removed before edition, and reinstated after. For translator comments +pertaining to obsolete entries, the uncommenting and recommenting operations +are done twice. Once in the editing window, the keys <KBD>C-c C-c</KBD> +allow the translator to tell she is finished with editing the comment. +See section <A HREF="gettext_6.html#SEC30">6.8 Details of Sub Edition</A>, for further details. + +</P> +<P> +Functions found on <CODE>po-subedit-mode-hook</CODE>, if any, are executed after +the string has been inserted in the edit buffer. + +</P> +<P> +The command <KBD>K</KBD> (<CODE>po-kill-comment</CODE>) gets rid of all +translator comments, while saving those comments on the kill ring. +The command <KBD>W</KBD> (<CODE>po-kill-ring-save-comment</CODE>) takes +a copy of the translator comments on the kill ring, but leaves +them undisturbed in the current entry. The command <KBD>Y</KBD> +(<CODE>po-yank-comment</CODE>) completely replaces the translator comments +by a string taken at the front of the kill ring. When this command +is immediately repeated, the comments just inserted are withdrawn, +and replaced by other strings taken along the kill ring. + +</P> +<P> +On the kill ring, all strings have the same nature. There is no +distinction between <EM>translation</EM> strings and <EM>translator +comments</EM> strings. So, for example, let's presume the translator +has just finished editing a translation, and wants to create a new +translator comment to document why the previous translation was +not good, just to remember what was the problem. Foreseeing that she +will do that in her documentation, the translator may want to quote +the previous translation in her translator comments. To do so, she +may initialize the translator comments with the previous translation, +still at the head of the kill ring. Because editing already pushed the +previous translation on the kill ring, she merely has to type <KBD>M-w</KBD> +prior to <KBD>#</KBD>, and the previous translation will be right there, +all ready for being introduced by some explanatory text. + +</P> +<P> +On the other hand, presume there are some translator comments already +and that the translator wants to add to those comments, instead +of wholly replacing them. Then, she should edit the comment right +away with <KBD>#</KBD>. Once inside the editing window, she can use the +regular Emacs commands <KBD>C-y</KBD> (<CODE>yank</CODE>) and <KBD>M-y</KBD> +(<CODE>yank-pop</CODE>) to get the previous translation where she likes. + +</P> + + +<H2><A NAME="SEC30" HREF="gettext_toc.html#TOC30">6.8 Details of Sub Edition</A></H2> + +<P> +The PO subedit minor mode has a few peculiarities worth being described +in fuller detail. It installs a few commands over the usual editing set +of Emacs, which are described below. + +</P> +<DL COMPACT> + +<DT><KBD>C-c C-c</KBD> +<DD> +Complete edition. + +<DT><KBD>C-c C-k</KBD> +<DD> +Abort edition. + +<DT><KBD>C-c C-a</KBD> +<DD> +Consult auxiliary PO files. + +</DL> + +<P> +The window's contents represents a translation for a given message, +or a translator comment. The translator may modify this window to +her heart's content. Once this done, the command <KBD>C-c C-c</KBD> +(<CODE>po-subedit-exit</CODE>) may be used to return the edited translation into +the PO file, replacing the original translation, even if it moved out of +sight or if buffers were switched. + +</P> +<P> +If the translator becomes unsatisfied with her translation or comment, +to the extent she prefers keeping what was existent prior to the +<KBD><KBD>RET</KBD></KBD> or <KBD>#</KBD> command, she may use the command <KBD>C-c C-k</KBD> +(<CODE>po-subedit-abort</CODE>) to merely get rid of edition, while preserving +the original translation or comment. Another way would be for her to exit +normally with <KBD>C-c C-c</KBD>, then type <CODE>U</CODE> once for undoing the +whole effect of last edition. + +</P> +<P> +The command <KBD>C-c C-a</KBD> allows for glancing through translations +already achieved in other languages, directly while editing the current +translation. This may be quite convenient when the translator is fluent +at many languages, but of course, only makes sense when such completed +auxiliary PO files are already available to her (see section <A HREF="gettext_6.html#SEC32">6.10 Consulting Auxiliary PO Files</A>). + +</P> +<P> +Functions found on <CODE>po-subedit-mode-hook</CODE>, if any, are executed after +the string has been inserted in the edit buffer. + +</P> +<P> +While editing her translation, the translator should pay attention to not +inserting unwanted <KBD><KBD>RET</KBD></KBD> (newline) characters at the end of +the translated string if those are not meant to be there, or to removing +such characters when they are required. Since these characters are not +visible in the editing buffer, they are easily introduced by mistake. +To help her, <KBD><KBD>RET</KBD></KBD> automatically puts the character <KBD><</KBD> +at the end of the string being edited, but this <KBD><</KBD> is not really +part of the string. On exiting the editing window with <KBD>C-c C-c</KBD>, +PO mode automatically removes such <KBD><</KBD> and all whitespace added after +it. If the translator adds characters after the terminating <KBD><</KBD>, it +looses its delimiting property and integrally becomes part of the string. +If she removes the delimiting <KBD><</KBD>, then the edited string is taken +<EM>as is</EM>, with all trailing newlines, even if invisible. Also, if +the translated string ought to end itself with a genuine <KBD><</KBD>, then +the delimiting <KBD><</KBD> may not be removed; so the string should appear, +in the editing window, as ending with two <KBD><</KBD> in a row. + +</P> +<P> +When a translation (or a comment) is being edited, the translator may move +the cursor back into the PO file buffer and freely move to other entries, +browsing at will. If, with an edition pending, the translator wanders in the +PO file buffer, she may decide to start modifying another entry. Each entry +being edited has its own subedit buffer. It is possible to simultaneously +edit the translation <EM>and</EM> the comment of a single entry, or to +edit entries in different PO files, all at once. Typing <KBD><KBD>RET</KBD></KBD> +on a field already being edited merely resumes that particular edit. Yet, +the translator should better be comfortable at handling many Emacs windows! + +</P> +<P> +Pending subedits may be completed or aborted in any order, regardless +of how or when they were started. When many subedits are pending and the +translator asks for quitting the PO file (with the <KBD>q</KBD> command), subedits +are automatically resumed one at a time, so she may decide for each of them. + +</P> + + +<H2><A NAME="SEC31" HREF="gettext_toc.html#TOC31">6.9 C Sources Context</A></H2> + +<P> +PO mode is particularily powerful when used with PO files +created through GNU <CODE>gettext</CODE> utilities, as those utilities +insert special comments in the PO files they generate. +Some of these special comments relate the PO file entry to +exactly where the untranslated string appears in the program sources. + +</P> +<P> +When the translator gets to an untranslated entry, she is fairly +often faced with an original string which is not as informative as +it normally should be, being succinct, cryptic, or otherwise ambiguous. +Before chosing how to translate the string, she needs to understand +better what the string really means and how tight the translation has +to be. Most of times, when problems arise, the only way left to make +her judgment is looking at the true program sources from where this +string originated, searching for surrounding comments the programmer +might have put in there, and looking around for helping clues of +<EM>any</EM> kind. + +</P> +<P> +Surely, when looking at program sources, the translator will receive +more help if she is a fluent programmer. However, even if she is +not versed in programming and feels a little lost in C code, the +translator should not be shy at taking a look, once in a while. +It is most probable that she will still be able to find some of the +hints she needs. She will learn quickly to not feel uncomfortable +in program code, paying more attention to programmer's comments, +variable and function names (if he dared chosing them well), and +overall organization, than to programmation itself. + +</P> +<P> +The following commands are meant to help the translator at getting +program source context for a PO file entry. + +</P> +<DL COMPACT> + +<DT><KBD>s</KBD> +<DD> +Resume the display of a program source context, or cycle through them. + +<DT><KBD>M-s</KBD> +<DD> +Display of a program source context selected by menu. + +<DT><KBD>S</KBD> +<DD> +Add a directory to the search path for source files. + +<DT><KBD>M-S</KBD> +<DD> +Delete a directory from the search path for source files. + +</DL> + +<P> +The commands <KBD>s</KBD> (<CODE>po-cycle-reference</CODE>) and <KBD>M-s</KBD> +(<CODE>po-select-source-reference</CODE>) both open another window displaying +some source program file, and already positioned in such a way that +it shows an actual use of the string to be translated. By doing +so, the command gives source program context for the string. But if +the entry has no source context references, or if all references +are unresolved along the search path for program sources, then the +command diagnoses this as an error. + +</P> +<P> +Even if <KBD>s</KBD> (or <KBD>M-s</KBD>) opens a new window, the cursor stays +in the PO file window. If the translator really wants to +get into the program source window, she ought to do it explicitly, +maybe by using command <KBD>O</KBD>. + +</P> +<P> +When <KBD>s</KBD> is typed for the first time, or for a PO file entry which +is different of the last one used for getting source context, then the +command reacts by giving the first context available for this entry, +if any. If some context has already been recently displayed for the +current PO file entry, and the translator wandered off to do other +things, typing <KBD>s</KBD> again will merely resume, in another window, +the context last displayed. In particular, if the translator moved +the cursor away from the context in the source file, the command will +bring the cursor back to the context. By using <KBD>s</KBD> many times +in a row, with no other commands intervening, PO mode will cycle to +the next available contexts for this particular entry, getting back +to the first context once the last has been shown. + +</P> +<P> +The command <KBD>M-s</KBD> behaves differently. Instead of cycling through +references, it lets the translator choose a particular reference among +many, and displays that reference. It is best used with completion, +if the translator types <KBD><KBD>TAB</KBD></KBD> immediately after <KBD>M-s</KBD>, in +response to the question, she will be offered a menu of all possible +references, as a reminder of which are the acceptable answers. +This command is useful only where there are really many contexts +available for a single string to translate. + +</P> +<P> +Program source files are usually found relative to where the PO +file stands. As a special provision, when this fails, the file is +also looked for, but relative to the directory immediately above it. +Those two cases take proper care of most PO files. However, it might +happen that a PO file has been moved, or is edited in a different +place than its normal location. When this happens, the translator +should tell PO mode in which directory normally sits the genuine PO +file. Many such directories may be specified, and all together, they +constitute what is called the <STRONG>search path</STRONG> for program sources. +The command <KBD>S</KBD> (<CODE>po-consider-source-path</CODE>) is used to interactively +enter a new directory at the front of the search path, and the command +<KBD>M-S</KBD> (<CODE>po-ignore-source-path</CODE>) is used to select, with completion, +one of the directories she does not want anymore on the search path. + +</P> + + +<H2><A NAME="SEC32" HREF="gettext_toc.html#TOC32">6.10 Consulting Auxiliary PO Files</A></H2> + +<P> +PO mode is able to help the knowledgeable translator, being fluent in +many languages, at taking advantage of translations already achieved +in other languages she just happens to know. It provides these other +language translations as additional context for her own work. Moreover, +it has features to ease the production of translations for many languages +at once, for translators preferring to work in this way. + +</P> +<P> +An <STRONG>auxiliary</STRONG> PO file is an existing PO file meant for the same +package the translator is working on, but targeted to a different mother +tongue language. Commands exist for declaring and handling auxiliary +PO files, and also for showing contexts for the entry under work. + +</P> +<P> +Here are the auxiliary file commands available in PO mode. + +</P> +<DL COMPACT> + +<DT><KBD>a</KBD> +<DD> +Seek auxiliary files for another translation for the same entry. + +<DT><KBD>M-a</KBD> +<DD> +Switch to a particular auxiliary file. + +<DT><KBD>A</KBD> +<DD> +Declare this PO file as an auxiliary file. + +<DT><KBD>M-A</KBD> +<DD> +Remove this PO file from the list of auxiliary files. + +</DL> + +<P> +Command <KBD>A</KBD> (<CODE>po-consider-as-auxiliary</CODE>) adds the current +PO file to the list of auxiliary files, while command <KBD>M-A</KBD> +(<CODE>po-ignore-as-auxiliary</CODE> just removes it. + +</P> +<P> +The command <KBD>a</KBD> (<CODE>po-cycle-auxiliary</CODE>) seeks all auxiliary PO +files, round-robin, searching for a translated entry in some other language +having an <CODE>msgid</CODE> field identical as the one for the current entry. +The found PO file, if any, takes the place of the current PO file in +the display (its window gets on top). Before doing so, the current PO +file is also made into an auxiliary file, if not already. So, <KBD>a</KBD> +in this newly displayed PO file will seek another PO file, and so on, +so repeating <KBD>a</KBD> will eventually yield back the original PO file. + +</P> +<P> +The command <KBD>M-a</KBD> (<CODE>po-select-auxiliary</CODE>) asks the translator +for her choice of a particular auxiliary file, with completion, and +then switches to that selected PO file. The command also checks if +the selected file has an <CODE>msgid</CODE> field identical as the one for +the current entry, and if yes, this entry becomes current. Otherwise, +the cursor of the selected file is left undisturbed. + +</P> +<P> +For all this to work fully, auxiliary PO files will have to be normalized, +in that way that <CODE>msgid</CODE> fields should be written <EM>exactly</EM> +the same way. It is possible to write <CODE>msgid</CODE> fields in various +ways for representing the same string, different writing would break the +proper behaviour of the auxiliary file commands of PO mode. This is not +expected to be much a problem in practice, as most existing PO files have +their <CODE>msgid</CODE> entries written by the same GNU <CODE>gettext</CODE> tools. + +</P> +<P> +However, PO files initially created by PO mode itself, while marking +strings in source files, are normalised differently. So are PO +files resulting of the the <SAMP>`M-x normalize'</SAMP> command. Until these +discrepancies between PO mode and other GNU <CODE>gettext</CODE> tools get +fully resolved, the translator should stay aware of normalisation issues. + +</P> + + +<H2><A NAME="SEC33" HREF="gettext_toc.html#TOC33">6.11 Using Translation Compendiums</A></H2> + +<P> +Compendiums are yet to be implemented. + +</P> +<P> +An incoming PO mode feature will let the translator maintain a +compendium of already achieved translations. A <STRONG>compendium</STRONG> +is a special PO file containing a set of translations recurring in +many different packages. The translator will be given commands for +adding entries to her compendium, and later initializing untranslated +entries, or updating already translated entries, from translations +kept in the compendium. For this to work, however, the compendium +would have to be normalized. See section <A HREF="gettext_2.html#SEC12">2.5 Normalizing Strings in Entries</A>. + +</P> + +<P><HR><P> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_5.html">previous</A>, <A HREF="gettext_7.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +</BODY> +</HTML> diff --git a/doc/gettext_7.html b/doc/gettext_7.html new file mode 100644 index 0000000..74b8829 --- /dev/null +++ b/doc/gettext_7.html @@ -0,0 +1,268 @@ +<HTML> +<HEAD> +<!-- This HTML file has been created by texi2html 1.51 + from gettext.texi on 19 April 2001 --> + +<TITLE>GNU gettext utilities - 7 Producing Binary MO Files</TITLE> +</HEAD> +<BODY> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_6.html">previous</A>, <A HREF="gettext_8.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +<P><HR><P> + + +<H1><A NAME="SEC34" HREF="gettext_toc.html#TOC34">7 Producing Binary MO Files</A></H1> + + + +<H2><A NAME="SEC35" HREF="gettext_toc.html#TOC35">7.1 Invoking the <CODE>msgfmt</CODE> Program</A></H2> + + +<PRE> +Usage: msgfmt [<VAR>option</VAR>] <VAR>filename</VAR>.po ... +</PRE> + +<DL COMPACT> + +<DT><SAMP>`-a <VAR>number</VAR>'</SAMP> +<DD> +<DT><SAMP>`--alignment=<VAR>number</VAR>'</SAMP> +<DD> +Align strings to <VAR>number</VAR> bytes (default: 1). + +<DT><SAMP>`-h'</SAMP> +<DD> +<DT><SAMP>`--help'</SAMP> +<DD> +Display this help and exit. + +<DT><SAMP>`--no-hash'</SAMP> +<DD> +Binary file will not include the hash table. + +<DT><SAMP>`-o <VAR>file</VAR>'</SAMP> +<DD> +<DT><SAMP>`--output-file=<VAR>file</VAR>'</SAMP> +<DD> +Specify output file name as <VAR>file</VAR>. + +<DT><SAMP>`--strict'</SAMP> +<DD> +Direct the program to work strictly following the Uniforum/Sun +implementation. Currently this only affects the naming of the output +file. If this option is not given the name of the output file is the +same as the domain name. If the strict Uniforum mode is enabled the +suffix <TT>`.mo'</TT> is added to the file name if it is not already +present. + +We find this behaviour of Sun's implementation rather silly and so by +default this mode is <EM>not</EM> selected. + +<DT><SAMP>`-v'</SAMP> +<DD> +<DT><SAMP>`--verbose'</SAMP> +<DD> +Detect and diagnose input file anomalies which might represent +translation errors. The <CODE>msgid</CODE> and <CODE>msgstr</CODE> strings are +studied and compared. It is considered abnormal that one string +starts or ends with a newline while the other does not. + +Also, if the string represents a format string used in a +<CODE>printf</CODE>-like function both strings should have the same number of +<SAMP>`%'</SAMP> format specifiers, with matching types. If the flag +<CODE>c-format</CODE> or <CODE>possible-c-format</CODE> appears in the special +comment <KBD>#,</KBD> for this entry a check is performed. For example, the +check will diagnose using <SAMP>`%.*s'</SAMP> against <SAMP>`%s'</SAMP>, or <SAMP>`%d'</SAMP> +against <SAMP>`%s'</SAMP>, or <SAMP>`%d'</SAMP> against <SAMP>`%x'</SAMP>. It can even handle +positional parameters. + +Normally the <CODE>xgettext</CODE> program automatically decides whether a +string is a format string or not. This algorithm is not perfect, +though. It might regard a string as a format string though it is not +used in a <CODE>printf</CODE>-like function and so <CODE>msgfmt</CODE> might report +errors where there are none. Or the other way round: a string is not +regarded as a format string but it is used in a <CODE>printf</CODE>-like +function. + +So solve this problem the programmer can dictate the decision to the +<CODE>xgettext</CODE> program (see section <A HREF="gettext_3.html#SEC17">3.4 Special Comments preceding Keywords</A>). The translator should not +consider removing the flag from the <KBD>#,</KBD> line. This "fix" would be +reversed again as soon as <CODE>msgmerge</CODE> is called the next time. + +<DT><SAMP>`-V'</SAMP> +<DD> +<DT><SAMP>`--version'</SAMP> +<DD> +Output version information and exit. + +</DL> + +<P> +If input file is <SAMP>`-'</SAMP>, standard input is read. If output file +is <SAMP>`-'</SAMP>, output is written to standard output. + +</P> + + +<H2><A NAME="SEC36" HREF="gettext_toc.html#TOC36">7.2 The Format of GNU MO Files</A></H2> + +<P> +The format of the generated MO files is best described by a picture, +which appears below. + +</P> +<P> +The first two words serve the identification of the file. The magic +number will always signal GNU MO files. The number is stored in the +byte order of the generating machine, so the magic number really is +two numbers: <CODE>0x950412de</CODE> and <CODE>0xde120495</CODE>. The second +word describes the current revision of the file format. For now the +revision is 0. This might change in future versions, and ensures +that the readers of MO files can distinguish new formats from old +ones, so that both can be handled correctly. The version is kept +separate from the magic number, instead of using different magic +numbers for different formats, mainly because <TT>`/etc/magic'</TT> is +not updated often. It might be better to have magic separated from +internal format version identification. + +</P> +<P> +Follow a number of pointers to later tables in the file, allowing +for the extension of the prefix part of MO files without having to +recompile programs reading them. This might become useful for later +inserting a few flag bits, indication about the charset used, new +tables, or other things. + +</P> +<P> +Then, at offset <VAR>O</VAR> and offset <VAR>T</VAR> in the picture, two tables +of string descriptors can be found. In both tables, each string +descriptor uses two 32 bits integers, one for the string length, +another for the offset of the string in the MO file, counting in bytes +from the start of the file. The first table contains descriptors +for the original strings, and is sorted so the original strings +are in increasing lexicographical order. The second table contains +descriptors for the translated strings, and is parallel to the first +table: to find the corresponding translation one has to access the +array slot in the second array with the same index. + +</P> +<P> +Having the original strings sorted enables the use of simple binary +search, for when the MO file does not contain an hashing table, or +for when it is not practical to use the hashing table provided in +the MO file. This also has another advantage, as the empty string +in a PO file GNU <CODE>gettext</CODE> is usually <EM>translated</EM> into +some system information attached to that particular MO file, and the +empty string necessarily becomes the first in both the original and +translated tables, making the system information very easy to find. + +</P> +<P> +The size <VAR>S</VAR> of the hash table can be zero. In this case, the +hash table itself is not contained in the MO file. Some people might +prefer this because a precomputed hashing table takes disk space, and +does not win <EM>that</EM> much speed. The hash table contains indices +to the sorted array of strings in the MO file. Conflict resolution is +done by double hashing. The precise hashing algorithm used is fairly +dependent of GNU <CODE>gettext</CODE> code, and is not documented here. + +</P> +<P> +As for the strings themselves, they follow the hash file, and each +is terminated with a <KBD>NUL</KBD>, and this <KBD>NUL</KBD> is not counted in +the length which appears in the string descriptor. The <CODE>msgfmt</CODE> +program has an option selecting the alignment for MO file strings. +With this option, each string is separately aligned so it starts at +an offset which is a multiple of the alignment value. On some RISC +machines, a correct alignment will speed things up. + +</P> +<P> +Plural forms are stored by letting the plural of the original string +follow the singular of the original string, separated through a +<KBD>NUL</KBD> byte. The length which appears in the string descriptor +includes both. However, only the singular of the original string +takes part in the hash table lookup. The plural variants of the +translation are all stored consecutively, separated through a +<KBD>NUL</KBD> byte. Here also, the length in the string descriptor +includes all of them. + +</P> +<P> +Nothing prevents a MO file from having embedded <KBD>NUL</KBD>s in strings. +However, the program interface currently used already presumes +that strings are <KBD>NUL</KBD> terminated, so embedded <KBD>NUL</KBD>s are +somewhat useless. But the MO file format is general enough so other +interfaces would be later possible, if for example, we ever want to +implement wide characters right in MO files, where <KBD>NUL</KBD> bytes may +accidently appear. (No, we don't want to have wide characters in MO +files. They would make the file unnecessarily large, and the +<SAMP>`wchar_t'</SAMP> type being platform dependent, MO files would be +platform dependent as well.) + +</P> +<P> +This particular issue has been strongly debated in the GNU +<CODE>gettext</CODE> development forum, and it is expectable that MO file +format will evolve or change over time. It is even possible that many +formats may later be supported concurrently. But surely, we have to +start somewhere, and the MO file format described here is a good start. +Nothing is cast in concrete, and the format may later evolve fairly +easily, so we should feel comfortable with the current approach. + +</P> + +<PRE> + byte + +------------------------------------------+ + 0 | magic number = 0x950412de | + | | + 4 | file format revision = 0 | + | | + 8 | number of strings | == N + | | + 12 | offset of table with original strings | == O + | | + 16 | offset of table with translation strings | == T + | | + 20 | size of hashing table | == S + | | + 24 | offset of hashing table | == H + | | + . . + . (possibly more entries later) . + . . + | | + O | length & offset 0th string ----------------. + O + 8 | length & offset 1st string ------------------. + ... ... | | +O + ((N-1)*8)| length & offset (N-1)th string | | | + | | | | + T | length & offset 0th translation ---------------. + T + 8 | length & offset 1st translation -----------------. + ... ... | | | | +T + ((N-1)*8)| length & offset (N-1)th translation | | | | | + | | | | | | + H | start hash table | | | | | + ... ... | | | | + H + S * 4 | end hash table | | | | | + | | | | | | + | NUL terminated 0th string <----------------' | | | + | | | | | + | NUL terminated 1st string <------------------' | | + | | | | + ... ... | | + | | | | + | NUL terminated 0th translation <---------------' | + | | | + | NUL terminated 1st translation <-----------------' + | | + ... ... + | | + +------------------------------------------+ +</PRE> + +<P><HR><P> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_6.html">previous</A>, <A HREF="gettext_8.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +</BODY> +</HTML> diff --git a/doc/gettext_8.html b/doc/gettext_8.html new file mode 100644 index 0000000..4a18aa4 --- /dev/null +++ b/doc/gettext_8.html @@ -0,0 +1,119 @@ +<HTML> +<HEAD> +<!-- This HTML file has been created by texi2html 1.51 + from gettext.texi on 19 April 2001 --> + +<TITLE>GNU gettext utilities - 8 The User's View</TITLE> +</HEAD> +<BODY> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_7.html">previous</A>, <A HREF="gettext_9.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +<P><HR><P> + + +<H1><A NAME="SEC37" HREF="gettext_toc.html#TOC37">8 The User's View</A></H1> + +<P> +When GNU <CODE>gettext</CODE> will truly have reached its goal, average users +should feel some kind of astonished pleasure, seeing the effect of +that strange kind of magic that just makes their own native language +appear everywhere on their screens. As for naive users, they would +ideally have no special pleasure about it, merely taking their own +language for <EM>granted</EM>, and becoming rather unhappy otherwise. + +</P> +<P> +So, let's try to describe here how we would like the magic to operate, +as we want the users' view to be the simplest, among all ways one +could look at GNU <CODE>gettext</CODE>. All other software engineers: +programmers, translators, maintainers, should work together in such a +way that the magic becomes possible. This is a long and progressive +undertaking, and information is available about the progress of the +Translation Project. + +</P> +<P> +When a package is distributed, there are two kinds of users: +<STRONG>installers</STRONG> who fetch the distribution, unpack it, configure +it, compile it and install it for themselves or others to use; and +<STRONG>end users</STRONG> that call programs of the package, once these have +been installed at their site. GNU <CODE>gettext</CODE> is offering magic +for both installers and end users. + +</P> + + + +<H2><A NAME="SEC38" HREF="gettext_toc.html#TOC38">8.1 The Current <TT>`ABOUT-NLS'</TT> Matrix</A></H2> + +<P> +Languages are not equally supported in all packages using GNU +<CODE>gettext</CODE>. To know if some package uses GNU <CODE>gettext</CODE>, one +may check the distribution for the <TT>`ABOUT-NLS'</TT> information file, for +some <TT>`<VAR>ll</VAR>.po'</TT> files, often kept together into some <TT>`po/'</TT> +directory, or for an <TT>`intl/'</TT> directory. Internationalized packages +have usually many <TT>`<VAR>ll</VAR>.po'</TT> files, where <VAR>ll</VAR> represents +the language. section <A HREF="gettext_8.html#SEC40">8.3 Magic for End Users</A> for a complete description of the format +for <VAR>ll</VAR>. + +</P> +<P> +More generally, a matrix is available for showing the current state +of the Translation Project, listing which packages are prepared for +multi-lingual messages, and which languages are supported by each. +Because this information changes often, this matrix is not kept within +this GNU <CODE>gettext</CODE> manual. This information is often found in +file <TT>`ABOUT-NLS'</TT> from various distributions, but is also as old as +the distribution itself. A recent copy of this <TT>`ABOUT-NLS'</TT> file, +containing up-to-date information, should generally be found on the +Translation Project sites, and also on most GNU archive sites. + +</P> + + +<H2><A NAME="SEC39" HREF="gettext_toc.html#TOC39">8.2 Magic for Installers</A></H2> + +<P> +By default, packages fully using GNU <CODE>gettext</CODE>, internally, +are installed in such a way that they to allow translation of +messages. At <EM>configuration</EM> time, those packages should +automatically detect whether the underlying host system already provides +the GNU <CODE>gettext</CODE> functions. If not, +the GNU <CODE>gettext</CODE> library should be automatically prepared +and used. Installers may use special options at configuration +time for changing this behavior. The command <SAMP>`./configure +--with-included-gettext'</SAMP> bypasses system <CODE>gettext</CODE> to +use the included GNU <CODE>gettext</CODE> instead, +while <SAMP>`./configure --disable-nls'</SAMP> +produces programs totally unable to translate messages. + +</P> +<P> +Internationalized packages have usually many <TT>`<VAR>ll</VAR>.po'</TT> +files. Unless +translations are disabled, all those available are installed together +with the package. However, the environment variable <CODE>LINGUAS</CODE> +may be set, prior to configuration, to limit the installed set. +<CODE>LINGUAS</CODE> should then contain a space separated list of two-letter +codes, stating which languages are allowed. + +</P> + + +<H2><A NAME="SEC40" HREF="gettext_toc.html#TOC40">8.3 Magic for End Users</A></H2> + +<P> +We consider here those packages using GNU <CODE>gettext</CODE> internally, +and for which the installers did not disable translation at +<EM>configure</EM> time. Then, users only have to set the <CODE>LANG</CODE> +environment variable to the appropriate <SAMP>`<VAR>ll</VAR>_<VAR>CC</VAR>'</SAMP> +combination prior to using the programs in the package. See section <A HREF="gettext_8.html#SEC38">8.1 The Current <TT>`ABOUT-NLS'</TT> Matrix</A>. +For example, let's presume a German site. At the shell prompt, users +merely have to execute <SAMP>`setenv LANG de_DE'</SAMP> (in <CODE>csh</CODE>) or +<SAMP>`export LANG; LANG=de_DE'</SAMP> (in <CODE>sh</CODE>). They could even do +this from their <TT>`.login'</TT> or <TT>`.profile'</TT> file. + +</P> +<P><HR><P> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_7.html">previous</A>, <A HREF="gettext_9.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +</BODY> +</HTML> diff --git a/doc/gettext_9.html b/doc/gettext_9.html new file mode 100644 index 0000000..00592a1 --- /dev/null +++ b/doc/gettext_9.html @@ -0,0 +1,1410 @@ +<HTML> +<HEAD> +<!-- This HTML file has been created by texi2html 1.51 + from gettext.texi on 19 April 2001 --> + +<TITLE>GNU gettext utilities - 9 The Programmer's View</TITLE> +</HEAD> +<BODY> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_8.html">previous</A>, <A HREF="gettext_10.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +<P><HR><P> + + +<H1><A NAME="SEC41" HREF="gettext_toc.html#TOC41">9 The Programmer's View</A></H1> + +<P> +One aim of the current message catalog implementation provided by +GNU <CODE>gettext</CODE> was to use the systems message catalog handling, if the +installer wishes to do so. So we perhaps should first take a look at +the solutions we know about. The people in the POSIX committee did not +manage to agree on one of the semi-official standards which we'll +describe below. In fact they couldn't agree on anything, so they decided +only to include an example of an interface. The major Unix vendors +are split in the usage of the two most important specifications: X/Open's +catgets vs. Uniforum's gettext interface. We'll describe them both and +later explain our solution of this dilemma. + +</P> + + + +<H2><A NAME="SEC42" HREF="gettext_toc.html#TOC42">9.1 About <CODE>catgets</CODE></A></H2> + +<P> +The <CODE>catgets</CODE> implementation is defined in the X/Open Portability +Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the +process of creating this standard seemed to be too slow for some of +the Unix vendors so they created their implementations on preliminary +versions of the standard. Of course this leads again to problems while +writing platform independent programs: even the usage of <CODE>catgets</CODE> +does not guarantee a unique interface. + +</P> +<P> +Another, personal comment on this that only a bunch of committee members +could have made this interface. They never really tried to program +using this interface. It is a fast, memory-saving implementation, an +user can happily live with it. But programmers hate it (at least me and +some others do...) + +</P> +<P> +But we must not forget one point: after all the trouble with transfering +the rights on Unix(tm) they at last came to X/Open, the very same who +published this specification. This leads me to making the prediction +that this interface will be in future Unix standards (e.g. Spec1170) and +therefore part of all Unix implementation (implementations, which are +<EM>allowed</EM> to wear this name). + +</P> + + + +<H3><A NAME="SEC43" HREF="gettext_toc.html#TOC43">9.1.1 The Interface</A></H3> + +<P> +The interface to the <CODE>catgets</CODE> implementation consists of three +functions which correspond to those used in file access: <CODE>catopen</CODE> +to open the catalog for using, <CODE>catgets</CODE> for accessing the message +tables, and <CODE>catclose</CODE> for closing after work is done. Prototypes +for the functions and the needed definitions are in the +<CODE><nl_types.h></CODE> header file. + +</P> +<P> +<CODE>catopen</CODE> is used like in this: + +</P> + +<PRE> +nl_catd catd = catopen ("catalog_name", 0); +</PRE> + +<P> +The function takes as the argument the name of the catalog. This usual +refers to the name of the program or the package. The second parameter +is not further specified in the standard. I don't even know whether it +is implemented consistently among various systems. So the common advice +is to use <CODE>0</CODE> as the value. The return value is a handle to the +message catalog, equivalent to handles to file returned by <CODE>open</CODE>. + +</P> +<P> +This handle is of course used in the <CODE>catgets</CODE> function which can +be used like this: + +</P> + +<PRE> +char *translation = catgets (catd, set_no, msg_id, "original string"); +</PRE> + +<P> +The first parameter is this catalog descriptor. The second parameter +specifies the set of messages in this catalog, in which the message +described by <CODE>msg_id</CODE> is obtained. <CODE>catgets</CODE> therefore uses a +three-stage addressing: + +</P> + +<PRE> +catalog name => set number => message ID => translation +</PRE> + +<P> +The fourth argument is not used to address the translation. It is given +as a default value in case when one of the addressing stages fail. One +important thing to remember is that although the return type of catgets +is <CODE>char *</CODE> the resulting string <EM>must not</EM> be changed. It +should better be <CODE>const char *</CODE>, but the standard is published in +1988, one year before ANSI C. + +</P> +<P> +The last of these function functions is used and behaves as expected: + +</P> + +<PRE> +catclose (catd); +</PRE> + +<P> +After this no <CODE>catgets</CODE> call using the descriptor is legal anymore. + +</P> + + +<H3><A NAME="SEC44" HREF="gettext_toc.html#TOC44">9.1.2 Problems with the <CODE>catgets</CODE> Interface?!</A></H3> + +<P> +Now that this description seemed to be really easy -- where are the +problem we speak of? In fact the interface could be used in a +reasonable way, but constructing the message catalogs is a pain. The +reason for this lies in the third argument of <CODE>catgets</CODE>: the unique +message ID. This has to be a numeric value for all messages in a single +set. Perhaps you could imagine the problems keeping such a list while +changing the source code. Add a new message here, remove one there. Of +course there have been developed a lot of tools helping to organize this +chaos but one as the other fails in one aspect or the other. We don't +want to say that the other approach has no problems but they are far +more easy to manage. + +</P> + + +<H2><A NAME="SEC45" HREF="gettext_toc.html#TOC45">9.2 About <CODE>gettext</CODE></A></H2> + +<P> +The definition of the <CODE>gettext</CODE> interface comes from a Uniforum +proposal and it is followed by at least one major Unix vendor +(Sun) in its last developments. It is not specified in any official +standard, though. + +</P> +<P> +The main points about this solution is that it does not follow the +method of normal file handling (open-use-close) and that it does not +burden the programmer so many task, especially the unique key handling. +Of course here is also a unique key needed, but this key is the message +itself (how long or short it is). See section <A HREF="gettext_9.html#SEC53">9.3 Comparing the Two Interfaces</A> for a more +detailed comparison of the two methods. + +</P> +<P> +The following section contains a rather detailed description of the +interface. We make it that detailed because this is the interface +we chose for the GNU <CODE>gettext</CODE> Library. Programmers interested +in using this library will be interested in this description. + +</P> + + + +<H3><A NAME="SEC46" HREF="gettext_toc.html#TOC46">9.2.1 The Interface</A></H3> + +<P> +The minimal functionality an interface must have is a) to select a +domain the strings are coming from (a single domain for all programs is +not reasonable because its construction and maintenance is difficult, +perhaps impossible) and b) to access a string in a selected domain. + +</P> +<P> +This is principally the description of the <CODE>gettext</CODE> interface. It +has a global domain which unqualified usages reference. Of course this +domain is selectable by the user. + +</P> + +<PRE> +char *textdomain (const char *domain_name); +</PRE> + +<P> +This provides the possibility to change or query the current status of +the current global domain of the <CODE>LC_MESSAGE</CODE> category. The +argument is a null-terminated string, whose characters must be legal in +the use in filenames. If the <VAR>domain_name</VAR> argument is <CODE>NULL</CODE>, +the function return the current value. If no value has been set +before, the name of the default domain is returned: <EM>messages</EM>. +Please note that although the return value of <CODE>textdomain</CODE> is of +type <CODE>char *</CODE> no changing is allowed. It is also important to know +that no checks of the availability are made. If the name is not +available you will see this by the fact that no translations are provided. + +</P> +<P> +To use a domain set by <CODE>textdomain</CODE> the function + +</P> + +<PRE> +char *gettext (const char *msgid); +</PRE> + +<P> +is to be used. This is the simplest reasonable form one can imagine. +The translation of the string <VAR>msgid</VAR> is returned if it is available +in the current domain. If not available the argument itself is +returned. If the argument is <CODE>NULL</CODE> the result is undefined. + +</P> +<P> +One things which should come into mind is that no explicit dependency to +the used domain is given. The current value of the domain for the +<CODE>LC_MESSAGES</CODE> locale is used. If this changes between two +executions of the same <CODE>gettext</CODE> call in the program, both calls +reference a different message catalog. + +</P> +<P> +For the easiest case, which is normally used in internationalized +packages, once at the beginning of execution a call to <CODE>textdomain</CODE> +is issued, setting the domain to a unique name, normally the package +name. In the following code all strings which have to be translated are +filtered through the gettext function. That's all, the package speaks +your language. + +</P> + + +<H3><A NAME="SEC47" HREF="gettext_toc.html#TOC47">9.2.2 Solving Ambiguities</A></H3> + +<P> +While this single name domain works well for most applications there +might be the need to get translations from more than one domain. Of +course one could switch between different domains with calls to +<CODE>textdomain</CODE>, but this is really not convenient nor is it fast. A +possible situation could be one case subject to discussion during this +writing: all +error messages of functions in the set of common used functions should +go into a separate domain <CODE>error</CODE>. By this mean we would only need +to translate them once. +Another case are messages from a library, as these <EM>have</EM> to be +independent of the current domain set by the application. + +</P> +<P> +For this reasons there are two more functions to retrieve strings: + +</P> + +<PRE> +char *dgettext (const char *domain_name, const char *msgid); +char *dcgettext (const char *domain_name, const char *msgid, + int category); +</PRE> + +<P> +Both take an additional argument at the first place, which corresponds +to the argument of <CODE>textdomain</CODE>. The third argument of +<CODE>dcgettext</CODE> allows to use another locale but <CODE>LC_MESSAGES</CODE>. +But I really don't know where this can be useful. If the +<VAR>domain_name</VAR> is <CODE>NULL</CODE> or <VAR>category</VAR> has an value beside +the known ones, the result is undefined. It should also be noted that +this function is not part of the second known implementation of this +function family, the one found in Solaris. + +</P> +<P> +A second ambiguity can arise by the fact, that perhaps more than one +domain has the same name. This can be solved by specifying where the +needed message catalog files can be found. + +</P> + +<PRE> +char *bindtextdomain (const char *domain_name, + const char *dir_name); +</PRE> + +<P> +Calling this function binds the given domain to a file in the specified +directory (how this file is determined follows below). Especially a +file in the systems default place is not favored against the specified +file anymore (as it would be by solely using <CODE>textdomain</CODE>). A +<CODE>NULL</CODE> pointer for the <VAR>dir_name</VAR> parameter returns the binding +associated with <VAR>domain_name</VAR>. If <VAR>domain_name</VAR> itself is +<CODE>NULL</CODE> nothing happens and a <CODE>NULL</CODE> pointer is returned. Here +again as for all the other functions is true that none of the return +value must be changed! + +</P> +<P> +It is important to remember that relative path names for the +<VAR>dir_name</VAR> parameter can be trouble. Since the path is always +computed relative to the current directory different results will be +achieved when the program executes a <CODE>chdir</CODE> command. Relative +paths should always be avoided to avoid dependencies and +unreliabilities. + +</P> + + +<H3><A NAME="SEC48" HREF="gettext_toc.html#TOC48">9.2.3 Locating Message Catalog Files</A></H3> + +<P> +Because many different languages for many different packages have to be +stored we need some way to add these information to file message catalog +files. The way usually used in Unix environments is have this encoding +in the file name. This is also done here. The directory name given in +<CODE>bindtextdomain</CODE>s second argument (or the default directory), +followed by the value and name of the locale and the domain name are +concatenated: + +</P> + +<PRE> +<VAR>dir_name</VAR>/<VAR>locale</VAR>/LC_<VAR>category</VAR>/<VAR>domain_name</VAR>.mo +</PRE> + +<P> +The default value for <VAR>dir_name</VAR> is system specific. For the GNU +library, and for packages adhering to its conventions, it's: + +<PRE> +/usr/local/share/locale +</PRE> + +<P> +<VAR>locale</VAR> is the value of the locale whose name is this +<CODE>LC_<VAR>category</VAR></CODE>. For <CODE>gettext</CODE> and <CODE>dgettext</CODE> this +<CODE>LC_<VAR>category</VAR></CODE> is always <CODE>LC_MESSAGES</CODE>.<A NAME="DOCF3" HREF="gettext_foot.html#FOOT3">(3)</A> +The value of the locale is determined through +<CODE>setlocale (LC_<VAR>category</VAR>, NULL)</CODE>. +<A NAME="DOCF4" HREF="gettext_foot.html#FOOT4">(4)</A> +<CODE>dcgettext</CODE> specifies the locale category by the third argument. + +</P> + + +<H3><A NAME="SEC49" HREF="gettext_toc.html#TOC49">9.2.4 How to specify the output character set <CODE>gettext</CODE> uses</A></H3> + +<P> +<CODE>gettext</CODE> not only looks up a translation in a message catalog. It +also converts the translation on the fly to the desired output character +set. This is useful if the user is working in a different character set +than the translator who created the message catalog, because it avoids +distributing variants of message catalogs which differ only in the +character set. + +</P> +<P> +The output character set is, by default, the value of <CODE>nl_langinfo +(CODESET)</CODE>, which depends on the <CODE>LC_CTYPE</CODE> part of the current +locale. But programs which store strings in a locale independent way +(e.g. UTF-8) can request that <CODE>gettext</CODE> and related functions +return the translations in that encoding, by use of the +<CODE>bind_textdomain_codeset</CODE> function. + +</P> +<P> +Note that the <VAR>msgid</VAR> argument to <CODE>gettext</CODE> is not subject to +character set conversion. Also, when <CODE>gettext</CODE> does not find a +translation for <VAR>msgid</VAR>, it returns <VAR>msgid</VAR> unchanged -- +independently of the current output character set. It is therefore +recommended that all <VAR>msgid</VAR>s be US-ASCII strings. + +</P> +<P> +<DL> +<DT><U>Function:</U> char * <B>bind_textdomain_codeset</B> <I>(const char *<VAR>domainname</VAR>, const char *<VAR>codeset</VAR>)</I> +<DD><A NAME="IDX1"></A> +The <CODE>bind_textdomain_codeset</CODE> function can be used to specify the +output character set for message catalogs for domain <VAR>domainname</VAR>. +The <VAR>codeset</VAR> argument must be a valid codeset name which can be used +for the <CODE>iconv_open</CODE> function, or a null pointer. + +</P> +<P> +If the <VAR>codeset</VAR> parameter is the null pointer, +<CODE>bind_textdomain_codeset</CODE> returns the currently selected codeset +for the domain with the name <VAR>domainname</VAR>. It returns <CODE>NULL</CODE> if +no codeset has yet been selected. + +</P> +<P> +The <CODE>bind_textdomain_codeset</CODE> function can be used several times. +If used multiple times with the same <VAR>domainname</VAR> argument, the +later call overrides the settings made by the earlier one. + +</P> +<P> +The <CODE>bind_textdomain_codeset</CODE> function returns a pointer to a +string containing the name of the selected codeset. The string is +allocated internally in the function and must not be changed by the +user. If the system went out of core during the execution of +<CODE>bind_textdomain_codeset</CODE>, the return value is <CODE>NULL</CODE> and the +global variable <VAR>errno</VAR> is set accordingly. +</DL> + +</P> + + +<H3><A NAME="SEC50" HREF="gettext_toc.html#TOC50">9.2.5 Additional functions for plural forms</A></H3> + +<P> +The functions of the <CODE>gettext</CODE> family described so far (and all the +<CODE>catgets</CODE> functions as well) have one problem in the real world +which have been neglected completely in all existing approaches. What +is meant here is the handling of plural forms. + +</P> +<P> +Looking through Unix source code before the time anybody thought about +internationalization (and, sadly, even afterwards) one can often find +code similar to the following: + +</P> + +<PRE> + printf ("%d file%s deleted", n, n == 1 ? "" : "s"); +</PRE> + +<P> +After the first complaints from people internationalizing the code people +either completely avoided formulations like this or used strings like +<CODE>"file(s)"</CODE>. Both look unnatural and should be avoided. First +tries to solve the problem correctly looked like this: + +</P> + +<PRE> + if (n == 1) + printf ("%d file deleted", n); + else + printf ("%d files deleted", n); +</PRE> + +<P> +But this does not solve the problem. It helps languages where the +plural form of a noun is not simply constructed by adding an `s' but +that is all. Once again people fell into the trap of believing the +rules their language is using are universal. But the handling of plural +forms differs widely between the language families. For example, +Rafal Maszkowski <CODE><rzm@mat.uni.torun.pl></CODE> reports: + +</P> + +<BLOCKQUOTE> +<P> +In Polish we use e.g. plik (file) this way: + +<PRE> +1 plik +2,3,4 pliki +5-21 pliko'w +22-24 pliki +25-31 pliko'w +</PRE> + +<P> +and so on (o' means 8859-2 oacute which should be rather okreska, +similar to aogonek). +</BLOCKQUOTE> + +<P> +There are two things which can differ between languages (and even inside +language families); + +</P> + +<UL> +<LI> + +The form how plural forms are build differs. This is a problem with +languages which have many irregularities. German, for instance, is a +drastic case. Though English and German are part of the same language +family (Germanic), the almost regular forming of plural noun forms +(appending an `s') is hardly found in German. + +<LI> + +The number of plural forms differ. This is somewhat surprising for +those who only have experiences with Romanic and Germanic languages +since here the number is the same (there are two). + +But other language families have only one form or many forms. More +information on this in an extra section. +</UL> + +<P> +The consequence of this is that application writers should not try to +solve the problem in their code. This would be localization since it is +only usable for certain, hardcoded language environments. Instead the +extended <CODE>gettext</CODE> interface should be used. + +</P> +<P> +These extra functions are taking instead of the one key string two +strings and a numerical argument. The idea behind this is that using +the numerical argument and the first string as a key, the implementation +can select using rules specified by the translator the right plural +form. The two string arguments then will be used to provide a return +value in case no message catalog is found (similar to the normal +<CODE>gettext</CODE> behavior). In this case the rules for Germanic language +is used and it is assumed that the first string argument is the singular +form, the second the plural form. + +</P> +<P> +This has the consequence that programs without language catalogs can +display the correct strings only if the program itself is written using +a Germanic language. This is a limitation but since the GNU C library +(as well as the GNU <CODE>gettext</CODE> package) are written as part of the +GNU package and the coding standards for the GNU project require program +being written in English, this solution nevertheless fulfills its +purpose. + +</P> +<P> +<DL> +<DT><U>Function:</U> char * <B>ngettext</B> <I>(const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>)</I> +<DD><A NAME="IDX2"></A> +The <CODE>ngettext</CODE> function is similar to the <CODE>gettext</CODE> function +as it finds the message catalogs in the same way. But it takes two +extra arguments. The <VAR>msgid1</VAR> parameter must contain the singular +form of the string to be converted. It is also used as the key for the +search in the catalog. The <VAR>msgid2</VAR> parameter is the plural form. +The parameter <VAR>n</VAR> is used to determine the plural form. If no +message catalog is found <VAR>msgid1</VAR> is returned if <CODE>n == 1</CODE>, +otherwise <CODE>msgid2</CODE>. + +</P> +<P> +An example for the use of this function is: + +</P> + +<PRE> +printf (ngettext ("%d file removed", "%d files removed", n), n); +</PRE> + +<P> +Please note that the numeric value <VAR>n</VAR> has to be passed to the +<CODE>printf</CODE> function as well. It is not sufficient to pass it only to +<CODE>ngettext</CODE>. +</DL> + +</P> +<P> +<DL> +<DT><U>Function:</U> char * <B>dngettext</B> <I>(const char *<VAR>domain</VAR>, const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>)</I> +<DD><A NAME="IDX3"></A> +The <CODE>dngettext</CODE> is similar to the <CODE>dgettext</CODE> function in the +way the message catalog is selected. The difference is that it takes +two extra parameter to provide the correct plural form. These two +parameters are handled in the same way <CODE>ngettext</CODE> handles them. +</DL> + +</P> +<P> +<DL> +<DT><U>Function:</U> char * <B>dcngettext</B> <I>(const char *<VAR>domain</VAR>, const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>, int <VAR>category</VAR>)</I> +<DD><A NAME="IDX4"></A> +The <CODE>dcngettext</CODE> is similar to the <CODE>dcgettext</CODE> function in the +way the message catalog is selected. The difference is that it takes +two extra parameter to provide the correct plural form. These two +parameters are handled in the same way <CODE>ngettext</CODE> handles them. +</DL> + +</P> +<P> +Now, how do these functions solve the problem of the plural forms? +Without the input of linguists (which was not available) it was not +possible to determine whether there are only a few different forms in +which plural forms are formed or whether the number can increase with +every new supported language. + +</P> +<P> +Therefore the solution implemented is to allow the translator to specify +the rules of how to select the plural form. Since the formula varies +with every language this is the only viable solution except for +hardcoding the information in the code (which still would require the +possibility of extensions to not prevent the use of new languages). + +</P> +<P> +The information about the plural form selection has to be stored in the +header entry of the PO file (the one with the empty <CODE>msgid</CODE> string). +The plural form information looks like this: + +</P> + +<PRE> +Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1; +</PRE> + +<P> +The <CODE>nplurals</CODE> value must be a decimal number which specifies how +many different plural forms exist for this language. The string +following <CODE>plural</CODE> is an expression which is using the C language +syntax. Exceptions are that no negative numbers are allowed, numbers +must be decimal, and the only variable allowed is <CODE>n</CODE>. This +expression will be evaluated whenever one of the functions +<CODE>ngettext</CODE>, <CODE>dngettext</CODE>, or <CODE>dcngettext</CODE> is called. The +numeric value passed to these functions is then substituted for all uses +of the variable <CODE>n</CODE> in the expression. The resulting value then +must be greater or equal to zero and smaller than the value given as the +value of <CODE>nplurals</CODE>. + +</P> +<P> +The following rules are known at this point. The language with families +are listed. But this does not necessarily mean the information can be +generalized for the whole family (as can be easily seen in the table +below).<A NAME="DOCF5" HREF="gettext_foot.html#FOOT5">(5)</A>.} + +</P> +<DL COMPACT> + +<DT>Only one form: +<DD> +Some languages only require one single form. There is no distinction +between the singular and plural form. An appropriate header entry +would look like this: + + +<PRE> +Plural-Forms: nplurals=1; plural=0; +</PRE> + +Languages with this property include: + +<DL COMPACT> + +<DT>Finno-Ugric family +<DD> +Hungarian +<DT>Asian family +<DD> +Japanese +<DT>Turkic/Altaic family +<DD> +Turkish +</DL> + +<DT>Two forms, singular used for one only +<DD> +This is the form used in most existing programs since it is what English +is using. A header entry would look like this: + + +<PRE> +Plural-Forms: nplurals=2; plural=n != 1; +</PRE> + +(Note: this uses the feature of C expressions that boolean expressions +have to value zero or one.) + +Languages with this property include: + +<DL COMPACT> + +<DT>Germanic family +<DD> +Danish, Dutch, English, German, Norwegian, Swedish +<DT>Finno-Ugric family +<DD> +Estonian, Finnish +<DT>Latin/Greek family +<DD> +Greek +<DT>Semitic family +<DD> +Hebrew +<DT>Romanic family +<DD> +Italian, Spanish +<DT>Artificial +<DD> +Esperanto +</DL> + +<DT>Two forms, singular used for zero and one +<DD> +Exceptional case in the language family. The header entry would be: + + +<PRE> +Plural-Forms: nplurals=2; plural=n>1; +</PRE> + +Languages with this property include: + +<DL COMPACT> + +<DT>Romanic family +<DD> +French +</DL> + +<DT>Three forms, special cases for one and two +<DD> +The header entry would be: + + +<PRE> +Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2; +</PRE> + +Languages with this property include: + +<DL COMPACT> + +<DT>Celtic +<DD> +Gaeilge +</DL> + +<DT>Three forms, special case for numbers ending in 1[2-9] +<DD> +The header entry would look like this: + + +<PRE> +Plural-Forms: nplurals=3; \ + plural=n%10==1 && n%100!=11 ? 0 : \ + n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2; +</PRE> + +Languages with this property include: + +<DL COMPACT> + +<DT>Baltic family +<DD> +Lithuanian +</DL> + +<DT>Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4] +<DD> +The header entry would look like this: + + +<PRE> +Plural-Forms: nplurals=3; \ + plural=n%10==1 && n%100!=11 ? 0 : \ + n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; +</PRE> + +Languages with this property include: + +<DL COMPACT> + +<DT>Slavic family +<DD> +Czech, Russian, Slovak, Ukrainian +</DL> + +<DT>Three forms, special case for one and some numbers ending in 2, 3, or 4 +<DD> +The header entry would look like this: + + +<PRE> +Plural-Forms: nplurals=3; \ + plural=n==1 ? 0 : \ + n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; +</PRE> + +(Continuation in the next line is possible.) + +Languages with this property include: + +<DL COMPACT> + +<DT>Slavic family +<DD> +Polish +</DL> + +<DT>Four forms, special case for one and all numbers ending in 02, 03, or 04 +<DD> +The header entry would look like this: + + +<PRE> +Plural-Forms: nplurals=4; \ + plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3; +</PRE> + +Languages with this property include: + +<DL COMPACT> + +<DT>Slavic family +<DD> +Slovenian +</DL> +</DL> + + + +<H3><A NAME="SEC51" HREF="gettext_toc.html#TOC51">9.2.6 How to use <CODE>gettext</CODE> in GUI programs</A></H3> + +<P> +One place where the <CODE>gettext</CODE> functions, if used normally, have big +problems is within programs with graphical user interfaces (GUIs). The +problem is that many of the strings which have to be translated are very +short. They have to appear in pull-down menus which restricts the +length. But strings which are not containing entire sentences or at +least large fragments of a sentence may appear in more than one +situation in the program but might have different translations. This is +especially true for the one-word strings which are frequently used in +GUI programs. + +</P> +<P> +As a consequence many people say that the <CODE>gettext</CODE> approach is +wrong and instead <CODE>catgets</CODE> should be used which indeed does not +have this problem. But there is a very simple and powerful method to +handle these kind of problems with the <CODE>gettext</CODE> functions. + +</P> +<P> +As as example consider the following fictional situation. A GUI program +has a menu bar with the following entries: + +</P> + +<PRE> ++------------+------------+--------------------------------------+ +| File | Printer | | ++------------+------------+--------------------------------------+ +| Open | | Select | +| New | | Open | ++----------+ | Connect | + +----------+ +</PRE> + +<P> +To have the strings <CODE>File</CODE>, <CODE>Printer</CODE>, <CODE>Open</CODE>, +<CODE>New</CODE>, <CODE>Select</CODE>, and <CODE>Connect</CODE> translated there has to be +at some point in the code a call to a function of the <CODE>gettext</CODE> +family. But in two places the string passed into the function would be +<CODE>Open</CODE>. The translations might not be the same and therefore we +are in the dilemma described above. + +</P> +<P> +One solution to this problem is to artificially enlengthen the strings +to make them unambiguous. But what would the program do if no +translation is available? The enlengthened string is not what should be +printed. So we should use a little bit modified version of the functions. + +</P> +<P> +To enlengthen the strings a uniform method should be used. E.g., in the +example above the strings could be chosen as + +</P> + +<PRE> +Menu|File +Menu|Printer +Menu|File|Open +Menu|File|New +Menu|Printer|Select +Menu|Printer|Open +Menu|Printer|Connect +</PRE> + +<P> +Now all the strings are different and if now instead of <CODE>gettext</CODE> +the following little wrapper function is used, everything works just +fine: + +</P> +<P> +<A NAME="IDX5"></A> + +<PRE> + char * + sgettext (const char *msgid) + { + char *msgval = gettext (msgid); + if (msgval == msgid) + msgval = strrchr (msgid, '|') + 1; + return msgval; + } +</PRE> + +<P> +What this little function does is to recognize the case when no +translation is available. This can be done very efficiently by a +pointer comparison since the return value is the input value. If there +is no translation we know that the input string is in the format we used +for the Menu entries and therefore contains a <CODE>|</CODE> character. We +simply search for the last occurrence of this character and return a +pointer to the character following it. That's it! + +</P> +<P> +If one now consistently uses the enlengthened string form and replaces +the <CODE>gettext</CODE> calls with calls to <CODE>sgettext</CODE> (this is normally +limited to very few places in the GUI implementation) then it is +possible to produce a program which can be internationalized. + +</P> +<P> +The other <CODE>gettext</CODE> functions (<CODE>dgettext</CODE>, <CODE>dcgettext</CODE> +and the <CODE>ngettext</CODE> equivalents) can and should have corresponding +functions as well which look almost identical, except for the parameters +and the call to the underlying function. + +</P> +<P> +Now there is of course the question why such functions do not exist in +the GNU gettext package? There are two parts of the answer to this question. + +</P> + +<UL> +<LI> + +They are easy to write and therefore can be provided by the project they +are used in. This is not an answer by itself and must be seen together +with the second part which is: + +<LI> + +There is no way the gettext package can contain a version which can work +everywhere. The problem is the selection of the character to separate +the prefix from the actual string in the enlenghtened string. The +examples above used <CODE>|</CODE> which is a quite good choice because it +resembles a notation frequently used in this context and it also is a +character not often used in message strings. + +But what if the character is used in message strings? Or if the chose +character is not available in the character set on the machine one +compiles (e.g., <CODE>|</CODE> is not required to exist for ISO C; this is +why the <TT>`iso646.h'</TT> file exists in ISO C programming environments). +</UL> + +<P> +There is only one more comment to be said. The wrapper function above +requires that the translations strings are not enlengthened themselves. +This is only logical. There is no need to disambiguate the strings +(since they are never used as keys for a search) and one also saves +quite some memory and disk space by doing this. + +</P> + + +<H3><A NAME="SEC52" HREF="gettext_toc.html#TOC52">9.2.7 Optimization of the *gettext functions</A></H3> + +<P> +At this point of the discussion we should talk about an advantage of the +GNU <CODE>gettext</CODE> implementation. Some readers might have pointed out +that an internationalized program might have a poor performance if some +string has to be translated in an inner loop. While this is unavoidable +when the string varies from one run of the loop to the other it is +simply a waste of time when the string is always the same. Take the +following example: + +</P> + +<PRE> +{ + while (...) + { + puts (gettext ("Hello world")); + } +} +</PRE> + +<P> +When the locale selection does not change between two runs the resulting +string is always the same. One way to use this is: + +</P> + +<PRE> +{ + str = gettext ("Hello world"); + while (...) + { + puts (str); + } +} +</PRE> + +<P> +But this solution is not usable in all situation (e.g. when the locale +selection changes) nor does it lead to legible code. + +</P> +<P> +For this reason, GNU <CODE>gettext</CODE> caches previous translation results. +When the same translation is requested twice, with no new message +catalogs being loaded in between, <CODE>gettext</CODE> will, the second time, +find the result through a single cache lookup. + +</P> + + +<H2><A NAME="SEC53" HREF="gettext_toc.html#TOC53">9.3 Comparing the Two Interfaces</A></H2> + +<P> +The following discussion is perhaps a little bit colored. As said +above we implemented GNU <CODE>gettext</CODE> following the Uniforum +proposal and this surely has its reasons. But it should show how we +came to this decision. + +</P> +<P> +First we take a look at the developing process. When we write an +application using NLS provided by <CODE>gettext</CODE> we proceed as always. +Only when we come to a string which might be seen by the users and thus +has to be translated we use <CODE>gettext("...")</CODE> instead of +<CODE>"..."</CODE>. At the beginning of each source file (or in a central +header file) we define + +</P> + +<PRE> +#define gettext(String) (String) +</PRE> + +<P> +Even this definition can be avoided when the system supports the +<CODE>gettext</CODE> function in its C library. When we compile this code the +result is the same as if no NLS code is used. When you take a look at +the GNU <CODE>gettext</CODE> code you will see that we use <CODE>_("...")</CODE> +instead of <CODE>gettext("...")</CODE>. This reduces the number of +additional characters per translatable string to <EM>3</EM> (in words: +three). + +</P> +<P> +When now a production version of the program is needed we simply replace +the definition + +</P> + +<PRE> +#define _(String) (String) +</PRE> + +<P> +by + +</P> + +<PRE> +#include <libintl.h> +#define _(String) gettext (String) +</PRE> + +<P> +Additionally we run the program <TT>`xgettext'</TT> on all source code file +which contain translatable strings and that's it: we have a running +program which does not depend on translations to be available, but which +can use any that becomes available. + +</P> +<P> +The same procedure can be done for the <CODE>gettext_noop</CODE> invocations +(see section <A HREF="gettext_3.html#SEC18">3.5 Special Cases of Translatable Strings</A>). One usually defines <CODE>gettext_noop</CODE> as a +no-op macro. So you should consider the following code for your project: + +</P> + +<PRE> +#define gettext_noop(String) (String) +#define N_(String) gettext_noop (String) +</PRE> + +<P> +<CODE>N_</CODE> is a short form similar to <CODE>_</CODE>. The <TT>`Makefile'</TT> in +the <TT>`po/'</TT> directory of GNU <CODE>gettext</CODE> knows by default both of the +mentioned short forms so you are invited to follow this proposal for +your own ease. + +</P> +<P> +Now to <CODE>catgets</CODE>. The main problem is the work for the +programmer. Every time he comes to a translatable string he has to +define a number (or a symbolic constant) which has also be defined in +the message catalog file. He also has to take care for duplicate +entries, duplicate message IDs etc. If he wants to have the same +quality in the message catalog as the GNU <CODE>gettext</CODE> program +provides he also has to put the descriptive comments for the strings and +the location in all source code files in the message catalog. This is +nearly a Mission: Impossible. + +</P> +<P> +But there are also some points people might call advantages speaking for +<CODE>catgets</CODE>. If you have a single word in a string and this string +is used in different contexts it is likely that in one or the other +language the word has different translations. Example: + +</P> + +<PRE> +printf ("%s: %d", gettext ("number"), number_of_errors) + +printf ("you should see %d %s", number_count, + number_count == 1 ? gettext ("number") : gettext ("numbers")) +</PRE> + +<P> +Here we have to translate two times the string <CODE>"number"</CODE>. Even +if you do not speak a language beside English it might be possible to +recognize that the two words have a different meaning. In German the +first appearance has to be translated to <CODE>"Anzahl"</CODE> and the second +to <CODE>"Zahl"</CODE>. + +</P> +<P> +Now you can say that this example is really esoteric. And you are +right! This is exactly how we felt about this problem and decide that +it does not weight that much. The solution for the above problem could +be very easy: + +</P> + +<PRE> +printf ("%s %d", gettext ("number:"), number_of_errors) + +printf (number_count == 1 ? gettext ("you should see %d number") + : gettext ("you should see %d numbers"), + number_count) +</PRE> + +<P> +We believe that we can solve all conflicts with this method. If it is +difficult one can also consider changing one of the conflicting string a +little bit. But it is not impossible to overcome. + +</P> +<P> +<CODE>catgets</CODE> allows same original entry to have different translations, +but <CODE>gettext</CODE> has another, scalable approach for solving ambiguities +of this kind: See section <A HREF="gettext_9.html#SEC47">9.2.2 Solving Ambiguities</A>. + +</P> + + +<H2><A NAME="SEC54" HREF="gettext_toc.html#TOC54">9.4 Using libintl.a in own programs</A></H2> + +<P> +Starting with version 0.9.4 the library <CODE>libintl.h</CODE> should be +self-contained. I.e., you can use it in your own programs without +providing additional functions. The <TT>`Makefile'</TT> will put the header +and the library in directories selected using the <CODE>$(prefix)</CODE>. + +</P> +<P> +One exception of the above is found on HP-UX 10.01 systems. Here the C +library does not contain the <CODE>alloca</CODE> function (and the HP compiler +does not generate it inlined). But it is not intended to rewrite the whole +library just because of this dumb system. Instead include the +<CODE>alloca</CODE> function in all package you use the <CODE>libintl.a</CODE> in. + +</P> + + +<H2><A NAME="SEC55" HREF="gettext_toc.html#TOC55">9.5 Being a <CODE>gettext</CODE> grok</A></H2> + +<P> +To fully exploit the functionality of the GNU <CODE>gettext</CODE> library it +is surely helpful to read the source code. But for those who don't want +to spend that much time in reading the (sometimes complicated) code here +is a list comments: + +</P> + +<UL> +<LI>Changing the language at runtime + +For interactive programs it might be useful to offer a selection of the +used language at runtime. To understand how to do this one need to know +how the used language is determined while executing the <CODE>gettext</CODE> +function. The method which is presented here only works correctly +with the GNU implementation of the <CODE>gettext</CODE> functions. + +In the function <CODE>dcgettext</CODE> at every call the current setting of +the highest priority environment variable is determined and used. +Highest priority means here the following list with decreasing +priority: + + +<OL> +<LI><CODE>LANGUAGE</CODE> + +<LI><CODE>LC_ALL</CODE> + +<LI><CODE>LC_xxx</CODE>, according to selected locale + +<LI><CODE>LANG</CODE> + +</OL> + +Afterwards the path is constructed using the found value and the +translation file is loaded if available. + +What is now when the value for, say, <CODE>LANGUAGE</CODE> changes. According +to the process explained above the new value of this variable is found +as soon as the <CODE>dcgettext</CODE> function is called. But this also means +the (perhaps) different message catalog file is loaded. In other +words: the used language is changed. + +But there is one little hook. The code for gcc-2.7.0 and up provides +some optimization. This optimization normally prevents the calling of +the <CODE>dcgettext</CODE> function as long as no new catalog is loaded. But +if <CODE>dcgettext</CODE> is not called the program also cannot find the +<CODE>LANGUAGE</CODE> variable be changed (see section <A HREF="gettext_9.html#SEC52">9.2.7 Optimization of the *gettext functions</A>). A +solution for this is very easy. Include the following code in the +language switching function. + + +<PRE> + /* Change language. */ + setenv ("LANGUAGE", "fr", 1); + + /* Make change known. */ + { + extern int _nl_msg_cat_cntr; + ++_nl_msg_cat_cntr; + } +</PRE> + +The variable <CODE>_nl_msg_cat_cntr</CODE> is defined in <TT>`loadmsgcat.c'</TT>. +The programmer will find himself in need for a construct like this only +when developing programs which do run longer and provide the user to +select the language at runtime. Non-interactive programs (like all +these little Unix tools) should never need this. + +</UL> + + + +<H2><A NAME="SEC56" HREF="gettext_toc.html#TOC56">9.6 Temporary Notes for the Programmers Chapter</A></H2> + + + +<H3><A NAME="SEC57" HREF="gettext_toc.html#TOC57">9.6.1 Temporary - Two Possible Implementations</A></H3> + +<P> +There are two competing methods for language independent messages: +the X/Open <CODE>catgets</CODE> method, and the Uniforum <CODE>gettext</CODE> +method. The <CODE>catgets</CODE> method indexes messages by integers; the +<CODE>gettext</CODE> method indexes them by their English translations. +The <CODE>catgets</CODE> method has been around longer and is supported +by more vendors. The <CODE>gettext</CODE> method is supported by Sun, +and it has been heard that the COSE multi-vendor initiative is +supporting it. Neither method is a POSIX standard; the POSIX.1 +committee had a lot of disagreement in this area. + +</P> +<P> +Neither one is in the POSIX standard. There was much disagreement +in the POSIX.1 committee about using the <CODE>gettext</CODE> routines +vs. <CODE>catgets</CODE> (XPG). In the end the committee couldn't +agree on anything, so no messaging system was included as part +of the standard. I believe the informative annex of the standard +includes the XPG3 messaging interfaces, "...as an example of +a messaging system that has been implemented..." + +</P> +<P> +They were very careful not to say anywhere that you should use one +set of interfaces over the other. For more on this topic please +see the Programming for Internationalization FAQ. + +</P> + + +<H3><A NAME="SEC58" HREF="gettext_toc.html#TOC58">9.6.2 Temporary - About <CODE>catgets</CODE></A></H3> + +<P> +There have been a few discussions of late on the use of +<CODE>catgets</CODE> as a base. I think it important to present both +sides of the argument and hence am opting to play devil's advocate +for a little bit. + +</P> +<P> +I'll not deny the fact that <CODE>catgets</CODE> could have been designed +a lot better. It currently has quite a number of limitations and +these have already been pointed out. + +</P> +<P> +However there is a great deal to be said for consistency and +standardization. A common recurring problem when writing Unix +software is the myriad portability problems across Unix platforms. +It seems as if every Unix vendor had a look at the operating system +and found parts they could improve upon. Undoubtedly, these +modifications are probably innovative and solve real problems. +However, software developers have a hard time keeping up with all +these changes across so many platforms. + +</P> +<P> +And this has prompted the Unix vendors to begin to standardize their +systems. Hence the impetus for Spec1170. Every major Unix vendor +has committed to supporting this standard and every Unix software +developer waits with glee the day they can write software to this +standard and simply recompile (without having to use autoconf) +across different platforms. + +</P> +<P> +As I understand it, Spec1170 is roughly based upon version 4 of the +X/Open Portability Guidelines (XPG4). Because <CODE>catgets</CODE> and +friends are defined in XPG4, I'm led to believe that <CODE>catgets</CODE> +is a part of Spec1170 and hence will become a standardized component +of all Unix systems. + +</P> + + +<H3><A NAME="SEC59" HREF="gettext_toc.html#TOC59">9.6.3 Temporary - Why a single implementation</A></H3> + +<P> +Now it seems kind of wasteful to me to have two different systems +installed for accessing message catalogs. If we do want to remedy +<CODE>catgets</CODE> deficiencies why don't we try to expand <CODE>catgets</CODE> +(in a compatible manner) rather than implement an entirely new system. +Otherwise, we'll end up with two message catalog access systems installed +with an operating system - one set of routines for packages using GNU +<CODE>gettext</CODE> for their internationalization, and another set of routines +(catgets) for all other software. Bloated? + +</P> +<P> +Supposing another catalog access system is implemented. Which do +we recommend? At least for Linux, we need to attract as many +software developers as possible. Hence we need to make it as easy +for them to port their software as possible. Which means supporting +<CODE>catgets</CODE>. We will be implementing the <CODE>libintl</CODE> code +within our <CODE>libc</CODE>, but does this mean we also have to incorporate +another message catalog access scheme within our <CODE>libc</CODE> as well? +And what about people who are going to be using the <CODE>libintl</CODE> ++ non-<CODE>catgets</CODE> routines. When they port their software to +other platforms, they're now going to have to include the front-end +(<CODE>libintl</CODE>) code plus the back-end code (the non-<CODE>catgets</CODE> +access routines) with their software instead of just including the +<CODE>libintl</CODE> code with their software. + +</P> +<P> +Message catalog support is however only the tip of the iceberg. +What about the data for the other locale categories. They also have +a number of deficiencies. Are we going to abandon them as well and +develop another duplicate set of routines (should <CODE>libintl</CODE> +expand beyond message catalog support)? + +</P> +<P> +Like many parts of Unix that can be improved upon, we're stuck with balancing +compatibility with the past with useful improvements and innovations for +the future. + +</P> + + +<H3><A NAME="SEC60" HREF="gettext_toc.html#TOC60">9.6.4 Temporary - Notes</A></H3> + +<P> +X/Open agreed very late on the standard form so that many +implementations differ from the final form. Both of my system (old +Linux catgets and Ultrix-4) have a strange variation. + +</P> +<P> +OK. After incorporating the last changes I have to spend some time on +making the GNU/Linux <CODE>libc</CODE> <CODE>gettext</CODE> functions. So in future +Solaris is not the only system having <CODE>gettext</CODE>. + +</P> +<P><HR><P> +Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_8.html">previous</A>, <A HREF="gettext_10.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +</BODY> +</HTML> diff --git a/doc/gettext_foot.html b/doc/gettext_foot.html new file mode 100644 index 0000000..2bebe6d --- /dev/null +++ b/doc/gettext_foot.html @@ -0,0 +1,42 @@ +<HTML> +<HEAD> +<!-- This HTML file has been created by texi2html 1.51 + from gettext.texi on 19 April 2001 --> + +<TITLE>GNU gettext utilities - Footnotes</TITLE> +</HEAD> +<BODY> +<H1>GNU gettext tools, version 0.10.37</H1> +<H2>Native Language Support Library and Tools</H2> +<H2>Edition 0.10.37, 19 April 2001</H2> +<ADDRESS>Ulrich Drepper</ADDRESS> +<ADDRESS>Jim Meyering</ADDRESS> +<ADDRESS>Fran@,{c}ois Pinard</ADDRESS> +<P> +<P><HR><P> +<H3><A NAME="FOOT1" HREF="gettext_1.html#DOCF1">(1)</A></H3> +<P>In this manual, all mentions of Emacs +refers to either GNU Emacs or to XEmacs, which people sometimes call FSF +Emacs and Lucid Emacs, respectively. +<H3><A NAME="FOOT2" HREF="gettext_2.html#DOCF2">(2)</A></H3> +<P>This +limitation is not imposed by GNU <CODE>gettext</CODE>, but is for compatibility +with the <CODE>msgfmt</CODE> implementation on Solaris. +<H3><A NAME="FOOT3" HREF="gettext_9.html#DOCF3">(3)</A></H3> +<P>Some +system, eg Ultrix, don't have <CODE>LC_MESSAGES</CODE>. Here we use a more or +less arbitrary value for it, namely 1729, the smallest positive integer +which can be represented in two different ways as the sum of two cubes. +<H3><A NAME="FOOT4" HREF="gettext_9.html#DOCF4">(4)</A></H3> +<P>When the system does not support <CODE>setlocale</CODE> its behavior +in setting the locale values is simulated by looking at the environment +variables. +<H3><A NAME="FOOT5" HREF="gettext_9.html#DOCF5">(5)</A></H3> +<P>Additions are welcome. Send appropriate information to +@email{bug-glibc-manual@gnu.org +<P><HR><P> +This document was generated on 19 April 2001 using the +<A HREF="http://wwwcn.cern.ch/dci/texi2html/">texi2html</A> +translator version 1.51.</P> +</BODY> +</HTML> diff --git a/doc/gettext_toc.html b/doc/gettext_toc.html new file mode 100644 index 0000000..c635c16 --- /dev/null +++ b/doc/gettext_toc.html @@ -0,0 +1,146 @@ +<HTML> +<HEAD> +<!-- This HTML file has been created by texi2html 1.51 + from gettext.texi on 19 April 2001 --> + +<TITLE>GNU gettext utilities - Table of Contents</TITLE> +</HEAD> +<BODY> +<H1>GNU gettext tools, version 0.10.37</H1> +<H2>Native Language Support Library and Tools</H2> +<H2>Edition 0.10.37, 19 April 2001</H2> +<ADDRESS>Ulrich Drepper</ADDRESS> +<ADDRESS>Jim Meyering</ADDRESS> +<ADDRESS>Fran@,{c}ois Pinard</ADDRESS> +<P> +<P><HR><P> +<UL> +<LI><A NAME="TOC1" HREF="gettext_1.html#SEC1">1 Introduction</A> +<UL> +<LI><A NAME="TOC2" HREF="gettext_1.html#SEC2">1.1 The Purpose of GNU <CODE>gettext</CODE></A> +<LI><A NAME="TOC3" HREF="gettext_1.html#SEC3">1.2 I18n, L10n, and Such</A> +<LI><A NAME="TOC4" HREF="gettext_1.html#SEC4">1.3 Aspects in Native Language Support</A> +<LI><A NAME="TOC5" HREF="gettext_1.html#SEC5">1.4 Files Conveying Translations</A> +<LI><A NAME="TOC6" HREF="gettext_1.html#SEC6">1.5 Overview of GNU <CODE>gettext</CODE></A> +</UL> +<LI><A NAME="TOC7" HREF="gettext_2.html#SEC7">2 PO Files and PO Mode Basics</A> +<UL> +<LI><A NAME="TOC8" HREF="gettext_2.html#SEC8">2.1 Completing GNU <CODE>gettext</CODE> Installation</A> +<LI><A NAME="TOC9" HREF="gettext_2.html#SEC9">2.2 The Format of PO Files</A> +<LI><A NAME="TOC10" HREF="gettext_2.html#SEC10">2.3 Main PO mode Commands</A> +<LI><A NAME="TOC11" HREF="gettext_2.html#SEC11">2.4 Entry Positioning</A> +<LI><A NAME="TOC12" HREF="gettext_2.html#SEC12">2.5 Normalizing Strings in Entries</A> +</UL> +<LI><A NAME="TOC13" HREF="gettext_3.html#SEC13">3 Preparing Program Sources</A> +<UL> +<LI><A NAME="TOC14" HREF="gettext_3.html#SEC14">3.1 Triggering <CODE>gettext</CODE> Operations</A> +<LI><A NAME="TOC15" HREF="gettext_3.html#SEC15">3.2 How Marks Appear in Sources</A> +<LI><A NAME="TOC16" HREF="gettext_3.html#SEC16">3.3 Marking Translatable Strings</A> +<LI><A NAME="TOC17" HREF="gettext_3.html#SEC17">3.4 Special Comments preceding Keywords</A> +<LI><A NAME="TOC18" HREF="gettext_3.html#SEC18">3.5 Special Cases of Translatable Strings</A> +</UL> +<LI><A NAME="TOC19" HREF="gettext_4.html#SEC19">4 Making the PO Template File</A> +<UL> +<LI><A NAME="TOC20" HREF="gettext_4.html#SEC20">4.1 Invoking the <CODE>xgettext</CODE> Program</A> +</UL> +<LI><A NAME="TOC21" HREF="gettext_5.html#SEC21">5 Creating a New PO File</A> +<LI><A NAME="TOC22" HREF="gettext_6.html#SEC22">6 Updating Existing PO Files</A> +<UL> +<LI><A NAME="TOC23" HREF="gettext_6.html#SEC23">6.1 Invoking the <CODE>msgmerge</CODE> Program</A> +<LI><A NAME="TOC24" HREF="gettext_6.html#SEC24">6.2 Translated Entries</A> +<LI><A NAME="TOC25" HREF="gettext_6.html#SEC25">6.3 Fuzzy Entries</A> +<LI><A NAME="TOC26" HREF="gettext_6.html#SEC26">6.4 Untranslated Entries</A> +<LI><A NAME="TOC27" HREF="gettext_6.html#SEC27">6.5 Obsolete Entries</A> +<LI><A NAME="TOC28" HREF="gettext_6.html#SEC28">6.6 Modifying Translations</A> +<LI><A NAME="TOC29" HREF="gettext_6.html#SEC29">6.7 Modifying Comments</A> +<LI><A NAME="TOC30" HREF="gettext_6.html#SEC30">6.8 Details of Sub Edition</A> +<LI><A NAME="TOC31" HREF="gettext_6.html#SEC31">6.9 C Sources Context</A> +<LI><A NAME="TOC32" HREF="gettext_6.html#SEC32">6.10 Consulting Auxiliary PO Files</A> +<LI><A NAME="TOC33" HREF="gettext_6.html#SEC33">6.11 Using Translation Compendiums</A> +</UL> +<LI><A NAME="TOC34" HREF="gettext_7.html#SEC34">7 Producing Binary MO Files</A> +<UL> +<LI><A NAME="TOC35" HREF="gettext_7.html#SEC35">7.1 Invoking the <CODE>msgfmt</CODE> Program</A> +<LI><A NAME="TOC36" HREF="gettext_7.html#SEC36">7.2 The Format of GNU MO Files</A> +</UL> +<LI><A NAME="TOC37" HREF="gettext_8.html#SEC37">8 The User's View</A> +<UL> +<LI><A NAME="TOC38" HREF="gettext_8.html#SEC38">8.1 The Current <TT>`ABOUT-NLS'</TT> Matrix</A> +<LI><A NAME="TOC39" HREF="gettext_8.html#SEC39">8.2 Magic for Installers</A> +<LI><A NAME="TOC40" HREF="gettext_8.html#SEC40">8.3 Magic for End Users</A> +</UL> +<LI><A NAME="TOC41" HREF="gettext_9.html#SEC41">9 The Programmer's View</A> +<UL> +<LI><A NAME="TOC42" HREF="gettext_9.html#SEC42">9.1 About <CODE>catgets</CODE></A> +<UL> +<LI><A NAME="TOC43" HREF="gettext_9.html#SEC43">9.1.1 The Interface</A> +<LI><A NAME="TOC44" HREF="gettext_9.html#SEC44">9.1.2 Problems with the <CODE>catgets</CODE> Interface?!</A> +</UL> +<LI><A NAME="TOC45" HREF="gettext_9.html#SEC45">9.2 About <CODE>gettext</CODE></A> +<UL> +<LI><A NAME="TOC46" HREF="gettext_9.html#SEC46">9.2.1 The Interface</A> +<LI><A NAME="TOC47" HREF="gettext_9.html#SEC47">9.2.2 Solving Ambiguities</A> +<LI><A NAME="TOC48" HREF="gettext_9.html#SEC48">9.2.3 Locating Message Catalog Files</A> +<LI><A NAME="TOC49" HREF="gettext_9.html#SEC49">9.2.4 How to specify the output character set <CODE>gettext</CODE> uses</A> +<LI><A NAME="TOC50" HREF="gettext_9.html#SEC50">9.2.5 Additional functions for plural forms</A> +<LI><A NAME="TOC51" HREF="gettext_9.html#SEC51">9.2.6 How to use <CODE>gettext</CODE> in GUI programs</A> +<LI><A NAME="TOC52" HREF="gettext_9.html#SEC52">9.2.7 Optimization of the *gettext functions</A> +</UL> +<LI><A NAME="TOC53" HREF="gettext_9.html#SEC53">9.3 Comparing the Two Interfaces</A> +<LI><A NAME="TOC54" HREF="gettext_9.html#SEC54">9.4 Using libintl.a in own programs</A> +<LI><A NAME="TOC55" HREF="gettext_9.html#SEC55">9.5 Being a <CODE>gettext</CODE> grok</A> +<LI><A NAME="TOC56" HREF="gettext_9.html#SEC56">9.6 Temporary Notes for the Programmers Chapter</A> +<UL> +<LI><A NAME="TOC57" HREF="gettext_9.html#SEC57">9.6.1 Temporary - Two Possible Implementations</A> +<LI><A NAME="TOC58" HREF="gettext_9.html#SEC58">9.6.2 Temporary - About <CODE>catgets</CODE></A> +<LI><A NAME="TOC59" HREF="gettext_9.html#SEC59">9.6.3 Temporary - Why a single implementation</A> +<LI><A NAME="TOC60" HREF="gettext_9.html#SEC60">9.6.4 Temporary - Notes</A> +</UL> +</UL> +<LI><A NAME="TOC61" HREF="gettext_10.html#SEC61">10 The Translator's View</A> +<UL> +<LI><A NAME="TOC62" HREF="gettext_10.html#SEC62">10.1 Introduction 0</A> +<LI><A NAME="TOC63" HREF="gettext_10.html#SEC63">10.2 Introduction 1</A> +<LI><A NAME="TOC64" HREF="gettext_10.html#SEC64">10.3 Discussions</A> +<LI><A NAME="TOC65" HREF="gettext_10.html#SEC65">10.4 Organization</A> +<UL> +<LI><A NAME="TOC66" HREF="gettext_10.html#SEC66">10.4.1 Central Coordination</A> +<LI><A NAME="TOC67" HREF="gettext_10.html#SEC67">10.4.2 National Teams</A> +<UL> +<LI><A NAME="TOC68" HREF="gettext_10.html#SEC68">10.4.2.1 Sub-Cultures</A> +<LI><A NAME="TOC69" HREF="gettext_10.html#SEC69">10.4.2.2 Organizational Ideas</A> +</UL> +<LI><A NAME="TOC70" HREF="gettext_10.html#SEC70">10.4.3 Mailing Lists</A> +</UL> +<LI><A NAME="TOC71" HREF="gettext_10.html#SEC71">10.5 Information Flow</A> +</UL> +<LI><A NAME="TOC72" HREF="gettext_11.html#SEC72">11 The Maintainer's View</A> +<UL> +<LI><A NAME="TOC73" HREF="gettext_11.html#SEC73">11.1 Flat or Non-Flat Directory Structures</A> +<LI><A NAME="TOC74" HREF="gettext_11.html#SEC74">11.2 Prerequisite Works</A> +<LI><A NAME="TOC75" HREF="gettext_11.html#SEC75">11.3 Invoking the <CODE>gettextize</CODE> Program</A> +<LI><A NAME="TOC76" HREF="gettext_11.html#SEC76">11.4 Files You Must Create or Alter</A> +<UL> +<LI><A NAME="TOC77" HREF="gettext_11.html#SEC77">11.4.1 <TT>`POTFILES.in'</TT> in <TT>`po/'</TT></A> +<LI><A NAME="TOC78" HREF="gettext_11.html#SEC78">11.4.2 <TT>`configure.in'</TT> at top level</A> +<LI><A NAME="TOC79" HREF="gettext_11.html#SEC79">11.4.3 <TT>`config.guess'</TT>, <TT>`config.sub'</TT> at top level</A> +<LI><A NAME="TOC80" HREF="gettext_11.html#SEC80">11.4.4 <TT>`aclocal.m4'</TT> at top level</A> +<LI><A NAME="TOC81" HREF="gettext_11.html#SEC81">11.4.5 <TT>`acconfig.h'</TT> at top level</A> +<LI><A NAME="TOC82" HREF="gettext_11.html#SEC82">11.4.6 <TT>`Makefile.in'</TT> at top level</A> +<LI><A NAME="TOC83" HREF="gettext_11.html#SEC83">11.4.7 <TT>`Makefile.in'</TT> in <TT>`src/'</TT></A> +</UL> +</UL> +<LI><A NAME="TOC84" HREF="gettext_12.html#SEC84">12 Concluding Remarks</A> +<UL> +<LI><A NAME="TOC85" HREF="gettext_12.html#SEC85">12.1 History of GNU <CODE>gettext</CODE></A> +<LI><A NAME="TOC86" HREF="gettext_12.html#SEC86">12.2 Related Readings</A> +</UL> +<LI><A NAME="TOC87" HREF="gettext_13.html#SEC87">A Language Codes</A> +<LI><A NAME="TOC88" HREF="gettext_14.html#SEC88">B Country Codes</A> +</UL> +<P><HR><P> +This document was generated on 19 April 2001 using the +<A HREF="http://wwwcn.cern.ch/dci/texi2html/">texi2html</A> +translator version 1.51.</P> +</BODY> +</HTML> |