summaryrefslogtreecommitdiffstats
path: root/doc/gettext_3.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/gettext_3.html')
-rw-r--r--doc/gettext_3.html620
1 files changed, 620 insertions, 0 deletions
diff --git a/doc/gettext_3.html b/doc/gettext_3.html
new file mode 100644
index 0000000..ee3715d
--- /dev/null
+++ b/doc/gettext_3.html
@@ -0,0 +1,620 @@
+<HTML>
+<HEAD>
+<!-- This HTML file has been created by texi2html 1.51
+ from gettext.texi on 19 April 2001 -->
+
+<TITLE>GNU gettext utilities - 3 Preparing Program Sources</TITLE>
+</HEAD>
+<BODY>
+Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_2.html">previous</A>, <A HREF="gettext_4.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
+<P><HR><P>
+
+
+<H1><A NAME="SEC13" HREF="gettext_toc.html#TOC13">3 Preparing Program Sources</A></H1>
+
+<P>
+For the programmer, changes to the C source code fall into three
+categories. First, you have to make the localization functions
+known to all modules needing message translation. Second, you should
+properly trigger the operation of GNU <CODE>gettext</CODE> when the program
+initializes, usually from the <CODE>main</CODE> function. Last, you should
+identify and especially mark all constant strings in your program
+needing translation.
+
+</P>
+<P>
+Presuming that your set of programs, or package, has been adjusted
+so all needed GNU <CODE>gettext</CODE> files are available, and your
+<TT>`Makefile'</TT> files are adjusted (see section <A HREF="gettext_11.html#SEC72">11 The Maintainer's View</A>), each C module
+having translated C strings should contain the line:
+
+</P>
+
+<PRE>
+#include &#60;libintl.h&#62;
+</PRE>
+
+<P>
+The remaining changes to your C sources are discussed in the further
+sections of this chapter.
+
+</P>
+
+
+
+<H2><A NAME="SEC14" HREF="gettext_toc.html#TOC14">3.1 Triggering <CODE>gettext</CODE> Operations</A></H2>
+
+<P>
+The initialization of locale data should be done with more or less
+the same code in every program, as demonstrated below:
+
+</P>
+
+<PRE>
+int
+main (argc, argv)
+ int argc;
+ char argv;
+{
+ ...
+ setlocale (LC_ALL, "");
+ bindtextdomain (PACKAGE, LOCALEDIR);
+ textdomain (PACKAGE);
+ ...
+}
+</PRE>
+
+<P>
+<VAR>PACKAGE</VAR> and <VAR>LOCALEDIR</VAR> should be provided either by
+<TT>`config.h'</TT> or by the Makefile. For now consult the <CODE>gettext</CODE>
+sources for more information.
+
+</P>
+<P>
+The use of <CODE>LC_ALL</CODE> might not be appropriate for you.
+<CODE>LC_ALL</CODE> includes all locale categories and especially
+<CODE>LC_CTYPE</CODE>. This later category is responsible for determining
+character classes with the <CODE>isalnum</CODE> etc. functions from
+<TT>`ctype.h'</TT> which could especially for programs, which process some
+kind of input language, be wrong. For example this would mean that a
+source code using the @,{c} (c-cedilla character) is runnable in
+France but not in the U.S.
+
+</P>
+<P>
+Some systems also have problems with parsing numbers using the
+<CODE>scanf</CODE> functions if an other but the <CODE>LC_ALL</CODE> locale is used.
+The standards say that additional formats but the one known in the
+<CODE>"C"</CODE> locale might be recognized. But some systems seem to reject
+numbers in the <CODE>"C"</CODE> locale format. In some situation, it might
+also be a problem with the notation itself which makes it impossible to
+recognize whether the number is in the <CODE>"C"</CODE> locale or the local
+format. This can happen if thousands separator characters are used.
+Some locales define this character accordfing to the national
+conventions to <CODE>'.'</CODE> which is the same character used in the
+<CODE>"C"</CODE> locale to denote the decimal point.
+
+</P>
+<P>
+So it is sometimes necessary to replace the <CODE>LC_ALL</CODE> line in the
+code above by a sequence of <CODE>setlocale</CODE> lines
+
+</P>
+
+<PRE>
+{
+ ...
+ setlocale (LC_CTYPE, "");
+ setlocale (LC_MESSAGES, "");
+ ...
+}
+</PRE>
+
+<P>
+On all POSIX conformant systems the locale categories <CODE>LC_CTYPE</CODE>,
+<CODE>LC_COLLATE</CODE>, <CODE>LC_MONETARY</CODE>, <CODE>LC_NUMERIC</CODE>, and
+<CODE>LC_TIME</CODE> are available. On some modern systems there is also a
+locale <CODE>LC_MESSAGES</CODE> which is called on some old, XPG2 compliant
+systems <CODE>LC_RESPONSES</CODE>.
+
+</P>
+<P>
+Note that changing the <CODE>LC_CTYPE</CODE> also affects the functions
+declared in the <CODE>&#60;ctype.h&#62;</CODE> standard header. If this is not
+desirable in your application (for example in a compiler's parser),
+you can use a set of substitute functions which hardwire the C locale,
+such as found in the <CODE>&#60;c-ctype.h&#62;</CODE> and <CODE>&#60;c-ctype.c&#62;</CODE> files
+in the gettext source distribution.
+
+</P>
+<P>
+It is also possible to switch the locale forth and back between the
+environment dependent locale and the C locale, but this approach is
+normally avoided because a <CODE>setlocale</CODE> call is expensive,
+because it is tedious to determine the places where a locale switch
+is needed in a large program's source, and because switching a locale
+is not multithread-safe.
+
+</P>
+
+
+<H2><A NAME="SEC15" HREF="gettext_toc.html#TOC15">3.2 How Marks Appear in Sources</A></H2>
+
+<P>
+All strings requiring translation should be marked in the C sources. Marking
+is done in such a way that each translatable string appears to be
+the sole argument of some function or preprocessor macro. There are
+only a few such possible functions or macros meant for translation,
+and their names are said to be marking keywords. The marking is
+attached to strings themselves, rather than to what we do with them.
+This approach has more uses. A blatant example is an error message
+produced by formatting. The format string needs translation, as
+well as some strings inserted through some <SAMP>`%s'</SAMP> specification
+in the format, while the result from <CODE>sprintf</CODE> may have so many
+different instances that it is impractical to list them all in some
+<SAMP>`error_string_out()'</SAMP> routine, say.
+
+</P>
+<P>
+This marking operation has two goals. The first goal of marking
+is for triggering the retrieval of the translation, at run time.
+The keyword are possibly resolved into a routine able to dynamically
+return the proper translation, as far as possible or wanted, for the
+argument string. Most localizable strings are found in executable
+positions, that is, attached to variables or given as parameters to
+functions. But this is not universal usage, and some translatable
+strings appear in structured initializations. See section <A HREF="gettext_3.html#SEC18">3.5 Special Cases of Translatable Strings</A>.
+
+</P>
+<P>
+The second goal of the marking operation is to help <CODE>xgettext</CODE>
+at properly extracting all translatable strings when it scans a set
+of program sources and produces PO file templates.
+
+</P>
+<P>
+The canonical keyword for marking translatable strings is
+<SAMP>`gettext'</SAMP>, it gave its name to the whole GNU <CODE>gettext</CODE>
+package. For packages making only light use of the <SAMP>`gettext'</SAMP>
+keyword, macro or function, it is easily used <EM>as is</EM>. However,
+for packages using the <CODE>gettext</CODE> interface more heavily, it
+is usually more convenient to give the main keyword a shorter, less
+obtrusive name. Indeed, the keyword might appear on a lot of strings
+all over the package, and programmers usually do not want nor need
+their program sources to remind them forcefully, all the time, that they
+are internationalized. Further, a long keyword has the disadvantage
+of using more horizontal space, forcing more indentation work on
+sources for those trying to keep them within 79 or 80 columns.
+
+</P>
+<P>
+Many packages use <SAMP>`_'</SAMP> (a simple underline) as a keyword,
+and write <SAMP>`_("Translatable string")'</SAMP> instead of <SAMP>`gettext
+("Translatable string")'</SAMP>. Further, the coding rule, from GNU standards,
+wanting that there is a space between the keyword and the opening
+parenthesis is relaxed, in practice, for this particular usage.
+So, the textual overhead per translatable string is reduced to
+only three characters: the underline and the two parentheses.
+However, even if GNU <CODE>gettext</CODE> uses this convention internally,
+it does not offer it officially. The real, genuine keyword is truly
+<SAMP>`gettext'</SAMP> indeed. It is fairly easy for those wanting to use
+<SAMP>`_'</SAMP> instead of <SAMP>`gettext'</SAMP> to declare:
+
+</P>
+
+<PRE>
+#include &#60;libintl.h&#62;
+#define _(String) gettext (String)
+</PRE>
+
+<P>
+instead of merely using <SAMP>`#include &#60;libintl.h&#62;'</SAMP>.
+
+</P>
+<P>
+Later on, the maintenance is relatively easy. If, as a programmer,
+you add or modify a string, you will have to ask yourself if the
+new or altered string requires translation, and include it within
+<SAMP>`_()'</SAMP> if you think it should be translated. <SAMP>`"%s: %d"'</SAMP> is
+an example of string <EM>not</EM> requiring translation!
+
+</P>
+
+
+<H2><A NAME="SEC16" HREF="gettext_toc.html#TOC16">3.3 Marking Translatable Strings</A></H2>
+
+<P>
+In PO mode, one set of features is meant more for the programmer than
+for the translator, and allows him to interactively mark which strings,
+in a set of program sources, are translatable, and which are not.
+Even if it is a fairly easy job for a programmer to find and mark
+such strings by other means, using any editor of his choice, PO mode
+makes this work more comfortable. Further, this gives translators
+who feel a little like programmers, or programmers who feel a little
+like translators, a tool letting them work at marking translatable
+strings in the program sources, while simultaneously producing a set of
+translation in some language, for the package being internationalized.
+
+</P>
+<P>
+The set of program sources, targetted by the PO mode commands describe
+here, should have an Emacs tags table constructed for your project,
+prior to using these PO file commands. This is easy to do. In any
+shell window, change the directory to the root of your project, then
+execute a command resembling:
+
+</P>
+
+<PRE>
+etags src/*.[hc] lib/*.[hc]
+</PRE>
+
+<P>
+presuming here you want to process all <TT>`.h'</TT> and <TT>`.c'</TT> files
+from the <TT>`src/'</TT> and <TT>`lib/'</TT> directories. This command will
+explore all said files and create a <TT>`TAGS'</TT> file in your root
+directory, somewhat summarizing the contents using a special file
+format Emacs can understand.
+
+</P>
+<P>
+For packages following the GNU coding standards, there is
+a make goal <CODE>tags</CODE> or <CODE>TAGS</CODE> which constructs the tag files in
+all directories and for all files containing source code.
+
+</P>
+<P>
+Once your <TT>`TAGS'</TT> file is ready, the following commands assist
+the programmer at marking translatable strings in his set of sources.
+But these commands are necessarily driven from within a PO file
+window, and it is likely that you do not even have such a PO file yet.
+This is not a problem at all, as you may safely open a new, empty PO
+file, mainly for using these commands. This empty PO file will slowly
+fill in while you mark strings as translatable in your program sources.
+
+</P>
+<DL COMPACT>
+
+<DT><KBD>,</KBD>
+<DD>
+Search through program sources for a string which looks like a
+candidate for translation.
+
+<DT><KBD>M-,</KBD>
+<DD>
+Mark the last string found with <SAMP>`_()'</SAMP>.
+
+<DT><KBD>M-.</KBD>
+<DD>
+Mark the last string found with a keyword taken from a set of possible
+keywords. This command with a prefix allows some management of these
+keywords.
+
+</DL>
+
+<P>
+The <KBD>,</KBD> (<CODE>po-tags-search</CODE>) command searches for the next
+occurrence of a string which looks like a possible candidate for
+translation, and displays the program source in another Emacs window,
+positioned in such a way that the string is near the top of this other
+window. If the string is too big to fit whole in this window, it is
+positioned so only its end is shown. In any case, the cursor
+is left in the PO file window. If the shown string would be better
+presented differently in different native languages, you may mark it
+using <KBD>M-,</KBD> or <KBD>M-.</KBD>. Otherwise, you might rather ignore it
+and skip to the next string by merely repeating the <KBD>,</KBD> command.
+
+</P>
+<P>
+A string is a good candidate for translation if it contains a sequence
+of three or more letters. A string containing at most two letters in
+a row will be considered as a candidate if it has more letters than
+non-letters. The command disregards strings containing no letters,
+or isolated letters only. It also disregards strings within comments,
+or strings already marked with some keyword PO mode knows (see below).
+
+</P>
+<P>
+If you have never told Emacs about some <TT>`TAGS'</TT> file to use, the
+command will request that you specify one from the minibuffer, the
+first time you use the command. You may later change your <TT>`TAGS'</TT>
+file by using the regular Emacs command <KBD>M-x visit-tags-table</KBD>,
+which will ask you to name the precise <TT>`TAGS'</TT> file you want
+to use. See section `Tag Tables' in <CITE>The Emacs Editor</CITE>.
+
+</P>
+<P>
+Each time you use the <KBD>,</KBD> command, the search resumes from where it was
+left by the previous search, and goes through all program sources,
+obeying the <TT>`TAGS'</TT> file, until all sources have been processed.
+However, by giving a prefix argument to the command (<KBD>C-u
+,)</KBD>, you may request that the search be restarted all over again
+from the first program source; but in this case, strings that you
+recently marked as translatable will be automatically skipped.
+
+</P>
+<P>
+Using this <KBD>,</KBD> command does not prevent using of other regular
+Emacs tags commands. For example, regular <CODE>tags-search</CODE> or
+<CODE>tags-query-replace</CODE> commands may be used without disrupting the
+independent <KBD>,</KBD> search sequence. However, as implemented, the
+<EM>initial</EM> <KBD>,</KBD> command (or the <KBD>,</KBD> command is used with a
+prefix) might also reinitialize the regular Emacs tags searching to the
+first tags file, this reinitialization might be considered spurious.
+
+</P>
+<P>
+The <KBD>M-,</KBD> (<CODE>po-mark-translatable</CODE>) command will mark the
+recently found string with the <SAMP>`_'</SAMP> keyword. The <KBD>M-.</KBD>
+(<CODE>po-select-mark-and-mark</CODE>) command will request that you type
+one keyword from the minibuffer and use that keyword for marking
+the string. Both commands will automatically create a new PO file
+untranslated entry for the string being marked, and make it the
+current entry (making it easy for you to immediately proceed to its
+translation, if you feel like doing it right away). It is possible
+that the modifications made to the program source by <KBD>M-,</KBD> or
+<KBD>M-.</KBD> render some source line longer than 80 columns, forcing you
+to break and re-indent this line differently. You may use the <KBD>O</KBD>
+command from PO mode, or any other window changing command from
+Emacs, to break out into the program source window, and do any
+needed adjustments. You will have to use some regular Emacs command
+to return the cursor to the PO file window, if you want command
+<KBD>,</KBD> for the next string, say.
+
+</P>
+<P>
+The <KBD>M-.</KBD> command has a few built-in speedups, so you do not
+have to explicitly type all keywords all the time. The first such
+speedup is that you are presented with a <EM>preferred</EM> keyword,
+which you may accept by merely typing <KBD><KBD>RET</KBD></KBD> at the prompt.
+The second speedup is that you may type any non-ambiguous prefix of the
+keyword you really mean, and the command will complete it automatically
+for you. This also means that PO mode has to <EM>know</EM> all
+your possible keywords, and that it will not accept mistyped keywords.
+
+</P>
+<P>
+If you reply <KBD>?</KBD> to the keyword request, the command gives a
+list of all known keywords, from which you may choose. When the
+command is prefixed by an argument (<KBD>C-u M-.</KBD>), it inhibits
+updating any program source or PO file buffer, and does some simple
+keyword management instead. In this case, the command asks for a
+keyword, written in full, which becomes a new allowed keyword for
+later <KBD>M-.</KBD> commands. Moreover, this new keyword automatically
+becomes the <EM>preferred</EM> keyword for later commands. By typing
+an already known keyword in response to <KBD>C-u M-.</KBD>, one merely
+changes the <EM>preferred</EM> keyword and does nothing more.
+
+</P>
+<P>
+All keywords known for <KBD>M-.</KBD> are recognized by the <KBD>,</KBD> command
+when scanning for strings, and strings already marked by any of those
+known keywords are automatically skipped. If many PO files are opened
+simultaneously, each one has its own independent set of known keywords.
+There is no provision in PO mode, currently, for deleting a known
+keyword, you have to quit the file (maybe using <KBD>q</KBD>) and reopen
+it afresh. When a PO file is newly brought up in an Emacs window, only
+<SAMP>`gettext'</SAMP> and <SAMP>`_'</SAMP> are known as keywords, and <SAMP>`gettext'</SAMP>
+is preferred for the <KBD>M-.</KBD> command. In fact, this is not useful to
+prefer <SAMP>`_'</SAMP>, as this one is already built in the <KBD>M-,</KBD> command.
+
+</P>
+
+
+<H2><A NAME="SEC17" HREF="gettext_toc.html#TOC17">3.4 Special Comments preceding Keywords</A></H2>
+
+<P>
+In C programs strings are often used within calls of functions from the
+<CODE>printf</CODE> family. The special thing about these format strings is
+that they can contain format specifiers introduced with <KBD>%</KBD>. Assume
+we have the code
+
+</P>
+
+<PRE>
+printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));
+</PRE>
+
+<P>
+A possible German translation for the above string might be:
+
+</P>
+
+<PRE>
+"%d Zeichen lang ist die Zeichenkette `%s'"
+</PRE>
+
+<P>
+A C programmer, even if he cannot speak German, will recognize that
+there is something wrong here. The order of the two format specifiers
+is changed but of course the arguments in the <CODE>printf</CODE> don't have.
+This will most probably lead to problems because now the length of the
+string is regarded as the address.
+
+</P>
+<P>
+To prevent errors at runtime caused by translations the <CODE>msgfmt</CODE>
+tool can check statically whether the arguments in the original and the
+translation string match in type and number. If this is not the case a
+warning will be given and the error cannot causes problems at runtime.
+
+</P>
+<P>
+If the word order in the above German translation would be correct one
+would have to write
+
+</P>
+
+<PRE>
+"%2$d Zeichen lang ist die Zeichenkette `%1$s'"
+</PRE>
+
+<P>
+The routines in <CODE>msgfmt</CODE> know about this special notation.
+
+</P>
+<P>
+Because not all strings in a program must be format strings it is not
+useful for <CODE>msgfmt</CODE> to test all the strings in the <TT>`.po'</TT> file.
+This might cause problems because the string might contain what looks
+like a format specifier, but the string is not used in <CODE>printf</CODE>.
+
+</P>
+<P>
+Therefore the <CODE>xgettext</CODE> adds a special tag to those messages it
+thinks might be a format string. There is no absolute rule for this,
+only a heuristic. In the <TT>`.po'</TT> file the entry is marked using the
+<CODE>c-format</CODE> flag in the <KBD>#,</KBD> comment line (see section <A HREF="gettext_2.html#SEC9">2.2 The Format of PO Files</A>).
+
+</P>
+<P>
+The careful reader now might say that this again can cause problems.
+The heuristic might guess it wrong. This is true and therefore
+<CODE>xgettext</CODE> knows about special kind of comment which lets
+the programmer take over the decision. If in the same line or
+the immediately preceding line of the <CODE>gettext</CODE> keyword
+the <CODE>xgettext</CODE> program find a comment containing the words
+<KBD>xgettext:c-format</KBD> it will mark the string in any case with
+the <KBD>c-format</KBD> flag. This kind of comment should be used when
+<CODE>xgettext</CODE> does not recognize the string as a format string but
+is really is one and it should be tested. Please note that when the
+comment is in the same line of the <CODE>gettext</CODE> keyword, it must be
+before the string to be translated.
+
+</P>
+<P>
+This situation happens quite often. The <CODE>printf</CODE> function is often
+called with strings which do not contain a format specifier. Of course
+one would normally use <CODE>fputs</CODE> but it does happen. In this case
+<CODE>xgettext</CODE> does not recognize this as a format string but what
+happens if the translation introduces a valid format specifier? The
+<CODE>printf</CODE> function will try to access one of the parameter but none
+exists because the original code does not refer to any parameter.
+
+</P>
+<P>
+<CODE>xgettext</CODE> of course could make a wrong decision the other way
+round, i.e. a string marked as a format string actually is not a format
+string. In this case the <CODE>msgfmt</CODE> might give too many warnings and
+would prevent translating the <TT>`.po'</TT> file. The method to prevent
+this wrong decision is similar to the one used above, only the comment
+to use must contain the string <KBD>xgettext:no-c-format</KBD>.
+
+</P>
+<P>
+If a string is marked with <KBD>c-format</KBD> and this is not correct the
+user can find out who is responsible for the decision. See
+section <A HREF="gettext_4.html#SEC20">4.1 Invoking the <CODE>xgettext</CODE> Program</A> to see how the <KBD>--debug</KBD> option can be
+used for solving this problem.
+
+</P>
+
+
+<H2><A NAME="SEC18" HREF="gettext_toc.html#TOC18">3.5 Special Cases of Translatable Strings</A></H2>
+
+<P>
+The attentive reader might now point out that it is not always possible
+to mark translatable string with <CODE>gettext</CODE> or something like this.
+Consider the following case:
+
+</P>
+
+<PRE>
+{
+ static const char *messages[] = {
+ "some very meaningful message",
+ "and another one"
+ };
+ const char *string;
+ ...
+ string
+ = index &#62; 1 ? "a default message" : messages[index];
+
+ fputs (string);
+ ...
+}
+</PRE>
+
+<P>
+While it is no problem to mark the string <CODE>"a default message"</CODE> it
+is not possible to mark the string initializers for <CODE>messages</CODE>.
+What is to be done? We have to fulfill two tasks. First we have to mark the
+strings so that the <CODE>xgettext</CODE> program (see section <A HREF="gettext_4.html#SEC20">4.1 Invoking the <CODE>xgettext</CODE> Program</A>)
+can find them, and second we have to translate the string at runtime
+before printing them.
+
+</P>
+<P>
+The first task can be fulfilled by creating a new keyword, which names a
+no-op. For the second we have to mark all access points to a string
+from the array. So one solution can look like this:
+
+</P>
+
+<PRE>
+#define gettext_noop(String) (String)
+
+{
+ static const char *messages[] = {
+ gettext_noop ("some very meaningful message"),
+ gettext_noop ("and another one")
+ };
+ const char *string;
+ ...
+ string
+ = index &#62; 1 ? gettext ("a default message") : gettext (messages[index]);
+
+ fputs (string);
+ ...
+}
+</PRE>
+
+<P>
+Please convince yourself that the string which is written by
+<CODE>fputs</CODE> is translated in any case. How to get <CODE>xgettext</CODE> know
+the additional keyword <CODE>gettext_noop</CODE> is explained in section <A HREF="gettext_4.html#SEC20">4.1 Invoking the <CODE>xgettext</CODE> Program</A>.
+
+</P>
+<P>
+The above is of course not the only solution. You could also come along
+with the following one:
+
+</P>
+
+<PRE>
+#define gettext_noop(String) (String)
+
+{
+ static const char *messages[] = {
+ gettext_noop ("some very meaningful message",
+ gettext_noop ("and another one")
+ };
+ const char *string;
+ ...
+ string
+ = index &#62; 1 ? gettext_noop ("a default message") : messages[index];
+
+ fputs (gettext (string));
+ ...
+}
+</PRE>
+
+<P>
+But this has some drawbacks. First the programmer has to take care that
+he uses <CODE>gettext_noop</CODE> for the string <CODE>"a default message"</CODE>.
+A use of <CODE>gettext</CODE> could have in rare cases unpredictable results.
+The second reason is found in the internals of the GNU <CODE>gettext</CODE>
+Library which will make this solution less efficient.
+
+</P>
+<P>
+One advantage is that you need not make control flow analysis to make
+sure the output is really translated in any case. But this analysis is
+generally not very difficult. If it should be in any situation you can
+use this second method in this situation.
+
+</P>
+<P><HR><P>
+Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_2.html">previous</A>, <A HREF="gettext_4.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
+</BODY>
+</HTML>