soffice2html.pl

2006-01-22

OpenOffice.orgのファイルをhtmlファイルに変換するPerlスクリプトです。
グッデイ版では、若干の不具合が改善されている模様(0.76)ですが、オリジナル最新版に反映されているかは不明。
あと、SXW用となっているけど、ODFでも動くか未確認です。
StarOffice/OpenOfficeのsxwファイルをhtmlファイルに変換します(画像、表、ほとんどの書式を含む)。
ImageMagick's convertとxsltproc(gnome libxsltプロジェクト)が必要です。
いくつかのオプションを提供しており、-h か引数なしで実行するとヘルプが表示されます。

Many thanks to Adrend van Beelen jr (http://www.liacs.nl/~dvbeelen) for xslt
fixes and additional styles.

Author: Steve Slaven - http://hoopajoo.net
--------------------------------------------------------------------------------

Usage: $0 [-hqvwp] [-e encoding] [-i imagedir]
      [-s style_base] [-t toc_class] FILE

-h    This help
-q    Quiet mode (deprecated, on by default)
-v    Verbose mode (used to be default)
-i    Image directory (default image/)
-s    CSS base style that all content is wrapped in (default soffice)
-t    Class name to build TOC from (default none)
-e    Output encoding, it will now try to autodetect it but you can
    override it with this switch (e.g. -e iso-8859-2)
-p    Output png files instead of jpg files
-w    Wrap output with <html></html>
-c    Path to "convert" binary
-x    Path to "xsltproc" binary
-T    Generate TOC only
-B    Generate body only

Converts the FILE to content.html with no standard body wrapper so it can
be inserted in to existing templates.  All images are converted to JPG
and all styles are converted to CSS included in the content.html.  Requires
ImageMagick's 'convert', and 'xsltproc' from libxml.

If you use -i, it will be reflected in the html file but you will need to
rename the image directory, mostly because I felt that it was unsafe in
the case that -i was "." and could delete stuff it shouldn't, since the
image/ directory is deleted when doing the conversion.



2005-01-05 10:14  bpk

    * soffice2html.xsl: Handlers for table borders, some fixes to the
    xlate-px option in the styles handler, and corrected width
    handling.

2005-01-04 21:08  bpk

    * soffice2html-frontend.pl: Fixed and tested replace list for smart
    quotes

2005-01-04 21:04  bpk

    * soffice2html-frontend.pl: Code to strip smart quotes from
    content.xml before conversion

2005-01-04 20:46  bpk

    * soffice2html-frontend.pl: Command line flags to set path to
    xsltproc/convert, flags to toggle body or toc only generation

2004-12-13 14:02  bpk

    * soffice2html.xsl: increment version number

2004-02-06 18:11  bpk

    * soffice2html-frontend.pl, soffice2html.xsl: Fixed many css
    related length bugs, fixed wrapper code for full body support

2004-02-04 16:04  bpk

    * README, soffice2html-frontend.pl, soffice2html.xsl: Several fixes
    for buggy image support, new list styles, impoved TOC handling,
    ability to generate a full HTML doc with html/body tags, and
    several minor formatting enhancements.

2003-05-12 23:04  psocccer

    * MANIFEST: Added manifest file for building

2003-04-30 09:09  psocccer

    * soffice2html-frontend.pl, soffice2html.xsl: Added output encoding
    support and auto detection for non utf-8 charsets

2003-04-29 09:05  psocccer

    * soffice2html-frontend.pl: Moved @params for xsltproc, older
    versions needed them as the first args

2003-04-29 09:01  psocccer

    * make-soffice2html, soffice2html-frontend.pl: Fixed version string
    bug, added gpl

2003-04-28 16:45  psocccer

    * README, make-soffice2html, soffice2html-frontend.pl,
    soffice2html.xsl: Initial revision

2003-04-28 16:45  psocccer

    * README, make-soffice2html, soffice2html-frontend.pl,
    soffice2html.xsl: Initial import