MobileRead Forums - View Single Post

Tex2002ans · 08-20-2019, 08:40 PM

Quote:

Originally Posted by joebob2a

This book has come to be through a pretty roundabout process, as you might suspect. It was originally written in M$ Word/OpenOffice, sucked into Quark Express, and then output as PDF in print form. Many corrections had happened between the original word processing files and the Quark files.

So the Quark file is the up-to-date version?

Quote:

Originally Posted by joebob2a

I used a web utility [...] to get from PDF back to Word, but then I had the page header and footers to worry about, not to mention typesetting issues like no space after periods and embedded hyphens.

A more robust OCR program (like Finereader) would avoid most of those issues.

Quote:

Originally Posted by joebob2a

I have the original Quark source, but I haven't found a conversion tool to get it out of that format.

What's the file extension on the Quark file? QXD?

Do you happen to know which version of Quark it used?

(And ~ when this book was published?)

I only worked on one QXD file many years ago, and surprisingly, LibreOffice was able to open it. It still required a lot of elbow grease, but it was a huge step up from having to OCR from scratch.

Quote:

Originally Posted by joebob2a

On the Smashwords site it talks about a "nuclear option," i.e. copy and paste the entire document into a Word document and re-convert it.

... no. Just no.

You lose all important formatting information (bold/italics/superscript), and underneath-the-surface is just as important as the text itself.

And depending on how the PDF was put together, that copy/paste itself might introduce a massive amount of issues as well (like the hard hyphens issue you mentioned).

You'll spend more time cleaning up all those errors than if you just worked from much cleaner OCR in the first place.