It would be helpful to investigate the problem with the ODT file you've used, but on the other hand maybe there's a way to find out what went wrong without you disclosing your document to me. For instance, I could try odt2html1 on a large document I make up myself, because I just tried short ones yet. Of course you're free to use whatever workflow you like ;-) That's one of the goals of the project, that users can build their own workflows, so that all tools are modular building blocks and can be invoked separately.
I just looked it up, ODT itself doesn't use i or b (the non-semantic, meaningless “bold” and “italic” buttons in OpenOffice/LibreOffice), internally those direct formatting visual apprearances on character level are implemented as styles. The names of those styles are non-static, they change if the document gets edited. At least the same internal style name gets used for portions of the text which has the same direct formatting (seems so). Therefore, if you've used “bold” and “italic” buttons consistently, you might want a tool which identifies “bold” and “italic” in the ODT and replaces them with a style name of your choice, where “bold” and “italic” would be a pretty bad idea and there's still the disadvantage that different meanings for the use of “bold” or “italic” can't be automatically identified, so additionally manual effort could be required to make it a quality ODT file. In HTML, I wouldn't introduce i or b at all, because they're contrary to quality HTML and bad for automatic processing.
So the options are to improve the ODT manually (since your document is a result of bad formatting habits, which are promoted by OpenOffice/LibreOffice just as in most other word processors), to use a helper to replace the direct formatting with styles if the direct formatting was used consistently and corresponds with a meaning, to introduce i and b to the output and make it bad quality, for which the ODT needs then to be interpreted, where it only gets transformed at the moment. My favorite is of course the first one, to get rid of the bad formatting habit, and probably getting front-ends like OpenOffice/LibreOffice to drop the promotion of such by replacing it with a style-based approach. There are other front-ends than OpenOffice/LibreOffice, which don't allow direct formatting at all, and OpenOffice/LibreOffice can be configured in a way that direct formatting is hidden from the user.
Sorry for your document, but in order to benefit from automatic processing, they need at least to be converted, and if direct formatting got applied indifferent, manual work is inevitable. I guess writer2xhtml has to spend quite some code on fixing the wrong usage of OpenOffice/LibreOffice, odt2html1 instead is based upon the right usage of OpenOffice/LibreOffice.
|