1 files changed, 268 insertions, 0 deletions
diff --git a/doc/gettext_7.html b/doc/gettext_7.html
new file mode 100644
index 0000000..74b8829
--- /dev/null
+++ b/doc/gettext_7.html
@@ -0,0 +1,268 @@
+<HTML>
+<HEAD>
+<!-- This HTML file has been created by texi2html 1.51
+     from gettext.texi on 19 April 2001 -->
+
+<TITLE>GNU gettext utilities - 7  Producing Binary MO Files</TITLE>
+</HEAD>
+<BODY>
+Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_6.html">previous</A>, <A HREF="gettext_8.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
+<P><HR><P>
+
+
+<H1><A NAME="SEC34" HREF="gettext_toc.html#TOC34">7  Producing Binary MO Files</A></H1>
+
+
+
+<H2><A NAME="SEC35" HREF="gettext_toc.html#TOC35">7.1  Invoking the <CODE>msgfmt</CODE> Program</A></H2>
+
+
+<PRE>
+Usage: msgfmt [<VAR>option</VAR>] <VAR>filename</VAR>.po ...
+</PRE>
+
+<DL COMPACT>
+
+<DT><SAMP>`-a <VAR>number</VAR>'</SAMP>
+<DD>
+<DT><SAMP>`--alignment=<VAR>number</VAR>'</SAMP>
+<DD>
+Align strings to <VAR>number</VAR> bytes (default: 1).
+
+<DT><SAMP>`-h'</SAMP>
+<DD>
+<DT><SAMP>`--help'</SAMP>
+<DD>
+Display this help and exit.
+
+<DT><SAMP>`--no-hash'</SAMP>
+<DD>
+Binary file will not include the hash table.
+
+<DT><SAMP>`-o <VAR>file</VAR>'</SAMP>
+<DD>
+<DT><SAMP>`--output-file=<VAR>file</VAR>'</SAMP>
+<DD>
+Specify output file name as <VAR>file</VAR>.
+
+<DT><SAMP>`--strict'</SAMP>
+<DD>
+Direct the program to work strictly following the Uniforum/Sun
+implementation.  Currently this only affects the naming of the output
+file.  If this option is not given the name of the output file is the
+same as the domain name.  If the strict Uniforum mode is enabled the
+suffix <TT>`.mo'</TT> is added to the file name if it is not already
+present.
+
+We find this behaviour of Sun's implementation rather silly and so by
+default this mode is <EM>not</EM> selected.
+
+<DT><SAMP>`-v'</SAMP>
+<DD>
+<DT><SAMP>`--verbose'</SAMP>
+<DD>
+Detect and diagnose input file anomalies which might represent
+translation errors.  The <CODE>msgid</CODE> and <CODE>msgstr</CODE> strings are
+studied and compared.  It is considered abnormal that one string
+starts or ends with a newline while the other does not.
+
+Also, if the string represents a format string used in a
+<CODE>printf</CODE>-like function both strings should have the same number of
+<SAMP>`%'</SAMP> format specifiers, with matching types.  If the flag
+<CODE>c-format</CODE> or <CODE>possible-c-format</CODE> appears in the special
+comment <KBD>#,</KBD> for this entry a check is performed.  For example, the
+check will diagnose using <SAMP>`%.*s'</SAMP> against <SAMP>`%s'</SAMP>, or <SAMP>`%d'</SAMP>
+against <SAMP>`%s'</SAMP>, or <SAMP>`%d'</SAMP> against <SAMP>`%x'</SAMP>.  It can even handle
+positional parameters.
+
+Normally the <CODE>xgettext</CODE> program automatically decides whether a
+string is a format string or not.  This algorithm is not perfect,
+though.  It might regard a string as a format string though it is not
+used in a <CODE>printf</CODE>-like function and so <CODE>msgfmt</CODE> might report
+errors where there are none.  Or the other way round: a string is not
+regarded as a format string but it is used in a <CODE>printf</CODE>-like
+function.
+
+So solve this problem the programmer can dictate the decision to the
+<CODE>xgettext</CODE> program (see section <A HREF="gettext_3.html#SEC17">3.4  Special Comments preceding Keywords</A>).  The translator should not
+consider removing the flag from the <KBD>#,</KBD> line.  This "fix" would be
+reversed again as soon as <CODE>msgmerge</CODE> is called the next time.
+
+<DT><SAMP>`-V'</SAMP>
+<DD>
+<DT><SAMP>`--version'</SAMP>
+<DD>
+Output version information and exit.
+
+</DL>
+
+<P>
+If input file is <SAMP>`-'</SAMP>, standard input is read.  If output file
+is <SAMP>`-'</SAMP>, output is written to standard output.
+
+</P>
+
+
+<H2><A NAME="SEC36" HREF="gettext_toc.html#TOC36">7.2  The Format of GNU MO Files</A></H2>
+
+<P>
+The format of the generated MO files is best described by a picture,
+which appears below.
+
+</P>
+<P>
+The first two words serve the identification of the file.  The magic
+number will always signal GNU MO files.  The number is stored in the
+byte order of the generating machine, so the magic number really is
+two numbers: <CODE>0x950412de</CODE> and <CODE>0xde120495</CODE>.  The second
+word describes the current revision of the file format.  For now the
+revision is 0.  This might change in future versions, and ensures
+that the readers of MO files can distinguish new formats from old
+ones, so that both can be handled correctly.  The version is kept
+separate from the magic number, instead of using different magic
+numbers for different formats, mainly because <TT>`/etc/magic'</TT> is
+not updated often.  It might be better to have magic separated from
+internal format version identification.
+
+</P>
+<P>
+Follow a number of pointers to later tables in the file, allowing
+for the extension of the prefix part of MO files without having to
+recompile programs reading them.  This might become useful for later
+inserting a few flag bits, indication about the charset used, new
+tables, or other things.
+
+</P>
+<P>
+Then, at offset <VAR>O</VAR> and offset <VAR>T</VAR> in the picture, two tables
+of string descriptors can be found.  In both tables, each string
+descriptor uses two 32 bits integers, one for the string length,
+another for the offset of the string in the MO file, counting in bytes
+from the start of the file.  The first table contains descriptors
+for the original strings, and is sorted so the original strings
+are in increasing lexicographical order.  The second table contains
+descriptors for the translated strings, and is parallel to the first
+table: to find the corresponding translation one has to access the
+array slot in the second array with the same index.
+
+</P>
+<P>
+Having the original strings sorted enables the use of simple binary
+search, for when the MO file does not contain an hashing table, or
+for when it is not practical to use the hashing table provided in
+the MO file.  This also has another advantage, as the empty string
+in a PO file GNU <CODE>gettext</CODE> is usually <EM>translated</EM> into
+some system information attached to that particular MO file, and the
+empty string necessarily becomes the first in both the original and
+translated tables, making the system information very easy to find.
+
+</P>
+<P>
+The size <VAR>S</VAR> of the hash table can be zero.  In this case, the
+hash table itself is not contained in the MO file.  Some people might
+prefer this because a precomputed hashing table takes disk space, and
+does not win <EM>that</EM> much speed.  The hash table contains indices
+to the sorted array of strings in the MO file.  Conflict resolution is
+done by double hashing.  The precise hashing algorithm used is fairly
+dependent of GNU <CODE>gettext</CODE> code, and is not documented here.
+
+</P>
+<P>
+As for the strings themselves, they follow the hash file, and each
+is terminated with a <KBD>NUL</KBD>, and this <KBD>NUL</KBD> is not counted in
+the length which appears in the string descriptor.  The <CODE>msgfmt</CODE>
+program has an option selecting the alignment for MO file strings.
+With this option, each string is separately aligned so it starts at
+an offset which is a multiple of the alignment value.  On some RISC
+machines, a correct alignment will speed things up.
+
+</P>
+<P>
+Plural forms are stored by letting the plural of the original string
+follow the singular of the original string, separated through a
+<KBD>NUL</KBD> byte.  The length which appears in the string descriptor
+includes both.  However, only the singular of the original string
+takes part in the hash table lookup.  The plural variants of the
+translation are all stored consecutively, separated through a
+<KBD>NUL</KBD> byte.  Here also, the length in the string descriptor
+includes all of them.
+
+</P>
+<P>
+Nothing prevents a MO file from having embedded <KBD>NUL</KBD>s in strings.
+However, the program interface currently used already presumes
+that strings are <KBD>NUL</KBD> terminated, so embedded <KBD>NUL</KBD>s are
+somewhat useless.  But the MO file format is general enough so other
+interfaces would be later possible, if for example, we ever want to
+implement wide characters right in MO files, where <KBD>NUL</KBD> bytes may
+accidently appear.  (No, we don't want to have wide characters in MO
+files.  They would make the file unnecessarily large, and the
+<SAMP>`wchar_t'</SAMP> type being platform dependent, MO files would be
+platform dependent as well.)
+
+</P>
+<P>
+This particular issue has been strongly debated in the GNU
+<CODE>gettext</CODE> development forum, and it is expectable that MO file
+format will evolve or change over time.  It is even possible that many
+formats may later be supported concurrently.  But surely, we have to
+start somewhere, and the MO file format described here is a good start.
+Nothing is cast in concrete, and the format may later evolve fairly
+easily, so we should feel comfortable with the current approach.
+
+</P>
+
+<PRE>
+        byte
+             +------------------------------------------+
+          0  | magic number = 0x950412de                |
+             |                                          |
+          4  | file format revision = 0                 |
+             |                                          |
+          8  | number of strings                        |  == N
+             |                                          |
+         12  | offset of table with original strings    |  == O
+             |                                          |
+         16  | offset of table with translation strings |  == T
+             |                                          |
+         20  | size of hashing table                    |  == S
+             |                                          |
+         24  | offset of hashing table                  |  == H
+             |                                          |
+             .                                          .
+             .    (possibly more entries later)         .
+             .                                          .
+             |                                          |
+          O  | length &#38; offset 0th string  ----------------.
+      O + 8  | length &#38; offset 1st string  ------------------.
+              ...                                    ...   | |
+O + ((N-1)*8)| length &#38; offset (N-1)th string           |  | |
+             |                                          |  | |
+          T  | length &#38; offset 0th translation  ---------------.
+      T + 8  | length &#38; offset 1st translation  -----------------.
+              ...                                    ...   | | | |
+T + ((N-1)*8)| length &#38; offset (N-1)th translation      |  | | | |
+             |                                          |  | | | |
+          H  | start hash table                         |  | | | |
+              ...                                    ...   | | | |
+  H + S * 4  | end hash table                           |  | | | |
+             |                                          |  | | | |
+             | NUL terminated 0th string  &#60;----------------' | | |
+             |                                          |    | | |
+             | NUL terminated 1st string  &#60;------------------' | |
+             |                                          |      | |
+              ...                                    ...       | |
+             |                                          |      | |
+             | NUL terminated 0th translation  &#60;---------------' |
+             |                                          |        |
+             | NUL terminated 1st translation  &#60;-----------------'
+             |                                          |
+              ...                                    ...
+             |                                          |
+             +------------------------------------------+
+</PRE>
+
+<P><HR><P>
+Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_6.html">previous</A>, <A HREF="gettext_8.html">next</A>, <A HREF="gettext_14.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
+</BODY>
+</HTML>