View Single Post
Old 08-22-2019, 12:16 PM   #10
joebob2a
Member
joebob2a began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Jun 2019
Device: epub
Grinding through it

Quote:
Originally Posted by Tex2002ans View Post
So the Quark file is the up-to-date version?
No, I've already invested significant time in cleaning up the PDF. The epub version is pretty close to where I want it, but it has all these technical issues. that the validators don't like.

Quote:
What's the file extension on the Quark file? QXD?

Do you happen to know which version of Quark it used?
The source files are .qxd files. I know it was generated on a Mac. Unknown as to version, but it's more than ten years old.

Quote:
(And ~ when this book was published?)
It went to print in 2009, just as the e-book revolution was turning the corner. I'm working on an e-book version because there's a surge in demand, and I just want it out there.


Quote:
I only worked on one QXD file many years ago, and surprisingly, LibreOffice was able to open it. It still required a lot of elbow grease, but it was a huge step up from having to OCR from scratch.

... no. Just no.
Amen to the No. LibreOffice wanted to turn the PDF files into graphics -- each page an image. The QXP files looked like random bits in LibreOffice.

Quote:
And depending on how the PDF was put together, that copy/paste itself might introduce a massive amount of issues as well (like the hard hyphens issue you mentioned).

You'll spend more time cleaning up all those errors than if you just worked from much cleaner OCR in the first place.
At this point, I'm just looking for something to fix the validation errors. I'm tempted to edit the html files in a text editor with group replace to correct the flagged errors, but I need to know the correct replacement for each of those errors. I had an earlier post talking about how Sigil was having trouble consolidating HTML files. Calibre was able to merge the files without breaking things, so that's now a viable option. I now have one html file for each of the eight major chapters, as opposed to dozens.

What's surprising to me is that there are all these great conversion utilities, yet nothing that addresses the validator errors.

Thanks again for all the help. I'll keep plugging on this.
joebob2a is offline   Reply With Quote