View Single Post
Old 02-28-2020, 06:52 AM   #16
ctop
Connoisseur
ctop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and grace
 
Posts: 63
Karma: 43710
Join Date: Jun 2008
Device: zaurus->palm->iPad->Sony PRS-T1,T2,T3->KoboForma&Likebook Ares->Palma2
Quote:
Originally Posted by Tex2002ans View Post

But if you want to take steps in making the PDF a proper ebook:

I grabbed this book and ran it through Scan Tailor Advanced + Finereader 12.

[...]

You can compare the text, and see how much more accurate 12 is compared to Archive.org's "EPUB". (Most importantly, the headers+page numbers are nearly all automatically removed and not clogging the text.)

4. I took Finereader's EPUB and ran it through my usual "Finereader cleanup Regex":

Attached it as [Finereader][CodeCleanup].epub.
Thank you amazing work. This is now really a pleasure to read on my Ares. My takeaway is that it really pays to invest the time to use Scantailor. Especially the removal of the page headers is great. Did you describe the regexes you are using somewhere?

Looking forward to your blog:-)

All the best,

Ctop
ctop is offline   Reply With Quote