[Home] [Download] [Previous] [Next]
This is a rather simple filter, which even doesn't support mathematical formulas (and probably never will support them, if nobody helps me to implement this functionality). So if you need a more powerful converter from OpenOffice.org Writer to LaTeX, consider using Henric Just's Writer2LaTeX (http://www.hj-gym.dk/~hj/writer2latex/).
The only reason why I released my own version of LaTeX filter is that I needed a conversion tool allowing to get a maximally clear LaTeX output. Unlike Writer2LaTeX, OfficeFMT doesn't claim to reproduce the original OpenOffice.org document layout: instead, it tries to generate an output similar to hand-written LaTeX documents, preserving only limited set of most commonly used character and paragraph formatting properties.
In order to save your document into the LaTeX format, select the "File -> Export..." menu item in OpenOffice.org (note that, since this filter is for exporting files only, it is not available in the "File -> Save As..." dialog). In the "Export" dialog box select "OfficeFMT - LaTeX document (.tex)", specify the desired output file name and click "Export". After that a filter options dialog box will appear where you can specify the following parameters:

The LaTeX output filter options dialog
Character set. This parameter is used not only to specify the desired encoding for the resulting document, but also to set the inputenc package option. Note that canonical encoding names may differ from ones used in LaTeX: for example, if you have selected ISO-8859-1 as the output encoding, the following line will be added to your LaTeX preamble:
\usepackage[latin1]{inputenc}
Please remember that it is possible to produce LaTeX output in UTF-8, but processing such a file with standard LaTeX compiler requires Dominique Unruh's ucs.sty package (it is not a part of the standard LaTeX system, so you may need to install it separately). So if you have selected this encoding, the following preamble commands are generated:
\usepackage{ucs}
\usepackage[utf8]{inputenc}
(note that utf8 is a LaTeX alias for UTF-8).
Line break. Here you can choose one of 3 line break styles commonly used in plain text files, i. e. DOS CRLF, UNIX LF or MAC CR.
Text width. Unlike Writer2LaTeX, OfficeFMT can break long lines on a specific position in order to generate a pretty formatted output. At this point OfficeFMT is similar to Chikrii Softlab Word2TeX, which also has this feature enabled by default. So you can use this field to specify the desired line break position.
Document class. Here you can specify the desired class
for the resulting LaTeX document. It is possible either to select one
of the standard LaTeX classes (i. e. article, book, letter, proc,
report or slides) from the combo box, or to type in your own
one.
Note that this option affects not only the LaTeX preamble, but also
the way how headings of the initial OpenOffice.org document are mapped
to LaTeX sectioning commands. By default the first level of headings
corresponds to the \chapter command, the second level to
\section and so on. However for the article
and proc classes, where the \chapter command
is not defined, the whole level sequence is shifted by one, so that the
first level corresponds to \section, and so on.
Class options. That's clear: here you can specify the desired
document class options (for example: a4paper,10pt).
No preamble. If enabled, this flag suppresses the output
of LaTeX preamble, starting \begin{document} and final
\end{document} statements, so that only the document body
is written to the resulting document. This is useful if you are
planning to include the resulting file into a main LaTeX document using
the \include command instead of compiling it by itself.
Ignore language markup. By default OfficeFMT uses the
\foreignlanguage and \selectlanguage
commands, defined in the Babel package, to reproduce text fragments
marked with a specific language in the initial OpenOffice.org document.
This type of layout is really very important for multilingual
documents, but sometimes language attributes in word processor files
are set so chaotic, that converting them has no sense and may just make
your LaTeX document even more obscure. In this case you may instruct
OfficeFMT to ignore language markup and than add it to the resulting
file manually, if necessary.
Floating objects. In this frame you can set several
common options for floating objects, such as table and
figure environments and paragraph boxes:
Placement specifier. This is an (optional) placement
specifier for table and figure environments
(htbp by default). Refer to your LaTeX documentation for
other possible values.
Scale width to... This is a width ratio, used mainly for paragraph boxes (and table cells in particular). As you surely know, LaTeX documents with standard layout usually have narrower text columns in comparison with word processor documents. So if you reproduce tables or text frames with their natural width, they very probably will look oversized. That's why this parameter was introduced, allowing to reduce horizontal dimensions of floating objects as desired (the reasonable ratio is normally 0.6 or 0.7). Note that vertical dimensions are not scaled and always reproduced as they are.
Additional packages. This frame contains several options allowing to control loading packages (mainly responsible for rendering some specific types of the OpenOffice.org document layout):
Font package. This combo box contains a list of some commonly used font packages, including the standard psnfss packages.
Font package options. If your font package requires any specific options, specify them here.
enumerate. The enumerate package defines
an extended syntax for the enumerate environment, so
that loading this package allows to partially reproduce numbering
formats used in your OpenOffice.org document. If this option is
disabled, numbered lists are converted to the enumerate
environment without additional parameters and their number
formatting is ignored.
endnote. By default endnotes are converted to bibliography items and collected at the end of document, as in Chikrii Softlab Word2TeX. Note that these "bibliography items" not necessarily should contain only bibliographical references: in fact, they are especially convenient for large endnotes consisting of several paragraphs, since you can edit them separately, without reflowing paragraphs of the main text.
An alternative way of rendering endnotes is the
endnote package providing the \endnote
command, which behaves similarly to \footnote, but
doesn't print note text until a \theendnotes command
occurs in the document. So if you instruct OfficeFMT to use this
package, it will enclose endnote text into an \endnote
command argument and insert a \theendnotes command at
the end of the output file.
indentfirst. Enabling this option doesn't affect the LaTeX document body: it just causes a
\usepackage{indentfirst}
line to be inserted into the preamble. This option was added just for convenience, since in some typographical traditions the first paragraph in a section should always be indented, so that it is necessary to always load this package.
longtable. This option causes the
longtable package to be loaded in the preamble and the
longtable environment to be used for rendering tables
instead of a pair of the table and tabular
environments.
For rendering multilingual documents OfficeFMT uses the standard
babel/fontenc/inputenc combination. As mentioned in the
previous section, the inputenc option is selected depending from the real
LaTeX document codepage. There is no special option allowing to select a
specific font encoding, since the fontenc package options are constructed
automatically depending from the list of languages used in the document.
The algorithm is rather simple: the T1 encoding is always
used, while T2A and LGR may be added for
Cyrillic and Greek script correspondingly.
The list of languages itself (needed both for the babel package and for
constructing the fontenc package options) is constructed by iterating
through paragraph and character styles defined in the document and
analyzing their language attribute. The ISO language codes
are converted to human readable names and converted to lowercase (since
the lowercase form is used in Babel). The langauge specified in the
parameters of the default style is treated as a main document language and
placed into the last position in the babel package options.
The OfficeFMT LaTeX filter converts Unicode characters to LaTeX
commands and ligatures according to its Unicode character database, stored
in the latex-symbol.xml file. Some characters are always converted: this
is true for all characters which have a special meaning in TeX (e. g.
percent sign or backslash) and so need to be escaped and also for some
characters which are rarely used in LaTeX files by itself, although such a
usage is not prohibited. For example, guillemots are always represented as
<< >>. All other characters are converted only if
they are not present in the selected output encoding.
Some LaTeX command to Unicode character mappings defined in the
database are valid only for a specific script. For example, all support
files for Cyrillic languages present in the Babel packages define the
\No command allowing to type the numero sign (U+2116).
However, for all other languages this command is not valid and should be
replaced with \textnumero. Another example is the
`~' symbol which is commonly used for non-breaking space in
all languages except polytonic Greek.
Of course, in order to select a LaTeX command valid for a specific script, the converter needs to know a language of the text fragment it processes. However, note that the language is just taken from the initial OpenOffice.org document and not validated. So it is on your responsibility either to specify a correct language for each document fragment, of to instruct the filter to ignore the language markup at all (as described above). This means that very probably you will not immediately get a valid LaTeX output (especially for multilingual documents). You may need to insert or remove some additional language switching commands to prevent compiler errors like the following:
! LaTeX Error: Command \cyrd unavailable in encoding T1.
The following LaTeX packages are automatically referenced in the preamble if the filter considers they are needed, so that you can't control loading them:
textcomp. This package instructs LaTeX to take some special characters from the TS1 encoding. Reference to textcomp is always inserted to the preamble, since several text commands specified in the OfficeFMT LaTeX filter Unicode character database are available only in the TS1 encoding.
multicol. This package defines the
\multicols command, which is needed for rendering
multicolumn sections. So this package is automatically loaded if such
sections are present in the document.
ulem. This package is referenced in the LaTeX preamble
for documents which contain some underlined text. Note that it is
always loaded with the normalem options, so that the
\emph command (which is mapped to the
Emphasis character style, defined in OpenOffice.org) is
not redefined.
graphicx. This package provides an extended version of
the \includegraphics command. It is automatically loaded
for documents which contain images.