Quote:
Originally Posted by retiredbiker
When I find one of these old chestnuts, if it has "mso-whatever" all over it, it's usually a terrible mess. Maybe converted several times, as well - including to/from PDF. Often it's so bad I just have Calibre make an RTF out of it, and start over in LibreOffice and/or gedit - remove all the formatting, preserve italics if possible, and rebuild it. Often faster than trying to correct it in the Calibre editor.
|
I use Mammoth on DOCX's I convert from professionally created PDF public-domain documents from institutional sources (.gov, org, .edu etc). Trying to do the same with commercial PDF's is usually pointless.
Much converted texts invariably have boatloads of content errors too, such as OCR induced spelling errors, casing anomalies, broken paragraphs, quote marks all over the place - straight, bent, missing, mismatched, and superfluous etc.
Like you I prefer to start over, but in my case its
always rather than
often; I have 30 years worth of Word usage and macro/add-on gathering at my disposal.
I don't seek to create typographical replicates of the original. The reverse in fact, I remove embellishments such as graphical scene markers, first para start effects such as dropcaps etc - I do unindent them though. I also like to spell out chapter numbers - i.e. Twenty-two instead of 22.
BR