Quote:
Originally Posted by abeonis
Hum ... a pro using LO and writer2xhtml, I am curious, I risk the fires of hell but I'll try an off topic parantesis:
- Does the docx imported documents look ok in LO?
- What kind of books, "easy" fiction or "complex" technical?
- Is the xhtml generated by writer2xhtml readable by a human (then regex)?
- Can we run writer2xhtml in batch mode?
If you come to Spain, expect to divide your pittance by 4.
|
Good questions. I'll answer not quite in order, since your 2nd question is most pertinent:
2) Yes, the simple, easy fiction; or non-fiction e.g. memoirs, etc. I almost never have to deal with tables, and indeed only the occasional interior illustrations. Illustrations are no problem, but I don't know how well writer2xhtml deals with tables. This may be a deal breaker for you. In my experience, epub in general doesn't deal well with tables (given my coding skills and a vast diversity of target reading devices), but when I need them I code them by hand.
1) Given the nature of the doc's I deal with, .doc or .docx look fine in LO / OO. I get a lot of variously formatted documents which must be hammered into shape (in LO) so as to provide formatting via a (more or less) fixed set of paragraph and character styles. I do this by hand, case by case. to produce a "standardized" .opf document that conforms to our house styling standards, and -- most important -- utilizes the aforementioned standard house styles. LO's fairly powerful search and replace functions (including regex) are very useful in this part of the workflow.
3) When the .opf document is in good shape, I use writer2xhtml (W2X) from the command line (because it's much faster -- I could also export from within LO) to create the "initial" epub. It will later be tweaked in Sigil. IMHO the xhtml code that is produced is very clean and straightforward. My W2X has been configured to recognize the "house styles" I use in the .opf document and convert them to pre-defined CSS styles in the epub. So there is very little extra work to be done in sigil, except for including extensive metadata in the content.opf of the epub. (I reckon that this could easily be incorporated into your scripts.)
4) Yes, writer2xhtml can be run from the command line, also can be customized with precision via config files and so on, BUT you have to have an .odt file as input, not (as far as I know) a .doc or .docx file. If you have that, scripting the conversion from .odt to .epub and onward "should" be trivial. (YMMV, and solution is left as an exercise!

)
What I don't know is how one would go about scripting the initial .docx --(via LO)--> .odf conversion. In your case, if all the .docx have similar formatting, you could simply open file in LO, save as .odf format, and then tweak writer2xhtml config files to suit the (known) .odf formatting.
I don't know if I've covered everything of interest to you, but feel free to ask away.
And in a feeble attempt to return this thread to its original topic, let me add that writer2xhtml --> epub does provide a UUID idetifier all by itself.
HTH
Albert
ETA: Oh and if my "pittance" were to be divided by 4, it would produce what the ancient Algol runtime messages called an "unrequited underflow." Very poetic, I always thought! Guess I'll postpone moving to Spain for now.