View Single Post
Old 05-31-2021, 11:37 AM   #11
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Ghitulescu View Post
So, this is one of the offenders (but almost every single epub locks my ADE)
https://archive.org/details/russoturkishwari01hozi
Yep, you most likely figured it out.

I'm betting the problem is the monolithic HTML file: ~900 KBs. If you have an older ereader, that would crash (can only handle files ~300 KBs).

Like you also figured out, a simple Calibre EPUB->EPUB with file splitting should take care of that issue.

Also, the book is laid out in two-column format. Usually, that's incredibly hard to OCR correctly. OCR might think both columns are a single line, so you get half-left/half-right sentences, making the ebook completely unreadable.

According to the metadata, looks like they ran it through Finereader 8.0.

I ran it through Finreader 12 for you, then created a very rough EPUB. This one should be more accurate + will at least not have all the headers/footers clogging up the text.

Note: This book's font also had very low-hanging+round 'g's. OCR thought they were 'O's on their own line, so you'll see lots of those randomly appearing within the EPUB.

Last edited by Tex2002ans; 05-31-2021 at 11:45 AM.
Tex2002ans is offline   Reply With Quote