View Single Post
Old 08-20-2019, 07:40 PM   #7
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by joebob2a View Post
This book has come to be through a pretty roundabout process, as you might suspect. It was originally written in M$ Word/OpenOffice, sucked into Quark Express, and then output as PDF in print form. Many corrections had happened between the original word processing files and the Quark files.
So the Quark file is the up-to-date version?

Quote:
Originally Posted by joebob2a View Post
I used a web utility [...] to get from PDF back to Word, but then I had the page header and footers to worry about, not to mention typesetting issues like no space after periods and embedded hyphens.
A more robust OCR program (like Finereader) would avoid most of those issues.

Quote:
Originally Posted by joebob2a View Post
I have the original Quark source, but I haven't found a conversion tool to get it out of that format.
What's the file extension on the Quark file? QXD?

Do you happen to know which version of Quark it used?

(And ~ when this book was published?)

I only worked on one QXD file many years ago, and surprisingly, LibreOffice was able to open it. It still required a lot of elbow grease, but it was a huge step up from having to OCR from scratch.

Quote:
Originally Posted by joebob2a View Post
On the Smashwords site it talks about a "nuclear option," i.e. copy and paste the entire document into a Word document and re-convert it.
... no. Just no.

You lose all important formatting information (bold/italics/superscript), and underneath-the-surface is just as important as the text itself.

And depending on how the PDF was put together, that copy/paste itself might introduce a massive amount of issues as well (like the hard hyphens issue you mentioned).

You'll spend more time cleaning up all those errors than if you just worked from much cleaner OCR in the first place.

Last edited by Tex2002ans; 08-20-2019 at 07:52 PM.
Tex2002ans is offline   Reply With Quote