Quote:
Originally Posted by Tex2002ans
But if you want to take steps in making the PDF a proper ebook:
I grabbed this book and ran it through Scan Tailor Advanced + Finereader 12.
[...]
You can compare the text, and see how much more accurate 12 is compared to Archive.org's "EPUB". (Most importantly, the headers+page numbers are nearly all automatically removed and not clogging the text.)
4. I took Finereader's EPUB and ran it through my usual "Finereader cleanup Regex":
Attached it as [Finereader][CodeCleanup].epub.
|
Thank you amazing work. This is now really a pleasure to read on my Ares. My takeaway is that it really pays to invest the time to use Scantailor. Especially the removal of the page headers is great. Did you describe the regexes you are using somewhere?
Looking forward to your blog:-)
All the best,
Ctop