Thread: pdf to epub
View Single Post
Old 07-18-2015, 03:59 AM   #13
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by crutledge View Post
After about three days of wrestling with FineReader, I finally got down to what might be considered an acceptable ePub.
Glad to hear you finally figured it out. :P

Quote:
Originally Posted by crutledge View Post
I have much to learn. If anyone is familiar with Verificat1on and the Character Table I sure would like to talk.
Character Table (?): What version of Finereader are you using? Perhaps they changed the name slightly in the newer ones. Are you talking about the "Pattern Editor" where you manually recreate the OCR for hard-to-OCR fonts?

Verification: May be helpful, depending on your workflow. I personally never use it, but I guess SOMEONE is getting some benefit out of it. It is one of those things they expanded in Finereader 12.

I would leave more thorough spellchecking for a much later step, with different tools (I much prefer the Spellcheck lists in Sigil + Calibre Editor). And you may prefer spellchecking in your favorite word processor (Word, LibreOffice, etc. etc.).

Quote:
Originally Posted by crutledge View Post
The ePub is attached if anyone would like to throw rocks.
Hard to tell how good you did without a link to the original PDF.

One quick thing of note that I did find was some spacing before/after Right/Left quotation marks. One of the final steps I typically do in Finereader is a search for "LEFT DOUBLE QUOTE + SPACE" and "SPACE + RIGHT DOUBLE QUOTE", and replace with the unspaced version.

I know there is also a setting buried in Finereader to automatically (?) fix those spacing errors, but I never used that checkmark. I always do that as one of the very last "rounds of fixes" (in many books, the left/right single/double quotation marks may be notoriously bad when OCRed).

Quote:
Originally Posted by crutledge View Post
I would also like to know the sequence used to get from PDF to ePub.
Way back in 2013, I did post a rough draft of an Outline I had written of my workflow at the time, and things to pay attention to while OCRing (planning to put together some sort of PDF -> EPUB Tutorial). It is Post #10 in this topic:

https://www.mobileread.com/forums/sho...d.php?t=223817

Some things have changed, most things haven't... and I would DEFINITELY expand lots of areas since then.

There are also some other alternative workflows/tools that can be used later, like going Finereader -> DOC(X) -> Word -> Toxaris's EPUB Tools -> EPUB. Toxaris initially built up his macros/tools to clean up a lot of Finereader cruft, to really speed up the monotonous merging of paragraphs, and other OCR errors that creep in.

I personally don't use Toxaris's Tools for the Finereader cleaning, but I DO use it for the other fantastic things, like Dialogue Check, which is far and away the best tool for fixing mismatching quotation marks (and now mismatching parenthesis/brackets too).

I personally still do A LOT of the cleaning and A/B comparison in Finereader, and then do the Finereader -> EPUB -> Manual cleanup with Sigil workflow.

Quote:
Originally Posted by crutledge View Post
The FineReader documentation is lacking in details I need.
If you want to chat over webcam, I could teach you my Finereader ways. Perhaps teaching an interested pupil would revive my 2013 project. I have been looking for an interested "guinea pig" for years!! :P

I have written a heck of a lot on the subject over the many years, but it is scattered over a ton of different topics/posts. Mostly with how I deal with tackling an individual subject X, Y, or Z (Tables, Footnotes, Equations/Formulas, etc. etc.).

Last I remember was one of those massive posts I always point back to:

https://www.mobileread.com/forums/sho...d.php?t=234146

Nibbling away at certain pieces here and there (answering the person's questions, and doing my usual expanding into semi-relevant/semi-related topics).

And I never did get around to expanding that Outline at all since 2013. It has mostly just been continually growing and refining in my head.

Last edited by Tex2002ans; 07-18-2015 at 04:28 AM.
Tex2002ans is offline   Reply With Quote