[Home] [Download] [Previous] [Next]


OfficeFMT XSLT filtering components

OfficeFMT currently provides 2 different XSLT filtering components:

The first component is very similar to the generic XSLT filter, bundled with OpenOffice.org 1.1. However, I had to add this filter into my package in order to implement the following 2 features not available in the standard XSLT filter:

This component may be used in combination with style sheets designed to transform any types of OpenOffice.org documents.

The second filtering component is based on the first one, but is designed especially for OpenOffice.org Writer documents. This component performs some XML code clean up before passing it to a stylesheet, i.e. it parses an XML document for redundant formatting tags and removes them, if necessary. It is well known that documents generated by OpenOffice.org are not always as clear as they could be. In particular, the most common problems with XML layout in Writer documents are the following:

The WriterFlatXMLOptimizer filtering component is used by all XSLT based filters, available in OfficeFMT, namely:

OpenOffice.org FlatXML filter

The FlatXML format is nearly the same thing as standard OpenOffice file format: the only difference is that the same XML layout which is splitted trough several files inside an sxw document is stored in a single XML file and without compression. Note that generating FlatXML is always a starting point for all XML based conversions, and so this job should be already performed before passing XML code to a stylesheet. So my version of OpenOffice.org Writer FlatXML filter just extends my XSLT filtering component with a simple office2flat.xsl stylesheet, which reproduces the code almost as it is, but additionally performs the following 2 things:

  1. Indentation is added to the XML output in order to make it more human readable.

  2. A link to another xsl stylesheet added into the generated xml documents. This xsl stylesheet (office2html.xsl) is also available in the xslt/office2html/ subdirectory of the OfficeFMT zipped package. So if you extract this file from the archive and put it into a directory where FlatXML files generated with OfficeFMT are stored, you will be able to preview your FlatXML documents with a Web browser (Mozilla or MSIE).

Of course the officefmt.xsl stylesheet is designed mainly for rather simple text documents, i. e. those including only text and tables. However, it can correctly reproduce almost all types of paragraph and character formatting. It also correctly handles footnotes/endnotes (they are collected at the end of the document and links to them are added to the body text) and (in most cases) correctly reproduces table layout, even for complex tables with cells spanned through several columns.

Note that the XML code passed to the Flat XML filter is always already cleaned up by the WriterFlatXMLOptimizer component. Of course it was possible to implement this code cleanup in pure XSLT, but, unfortunately, processing such a stylesheet might take a lot of machine time and resources (especially with the Xalan XSLT processor, which is used by Java (and so, by OpenOffice.org too) by default). However, OfficeFMT additionally includes a sample stylesheet (called writer2flat.xsl), which does the same job as the WriterFlatXMLOptimizer component and the office2flat.xsl file together. So you may use this stylesheet e. g. in order to beautify and clean up XML code produced by any other version of OpenOffice.org Writer FlatXML filter.

OfficeFMT XHTML filter

The same office2html.xsl stylesheet which is designed for displaying FlatXML files in a browser, can also be used for direct conversion of OpenOffice.org Writer files into the xhtml format. Once again, this stylesheet is designed mainly for simple files, but in some cases it can produce better results than the standard XHTML filter (which, for example, simply omits footnotes and endnotes).


[Home] [Download] [Previous] [Next]

The OpenOffice.org logo