MobileRead Forums - View Single Post

DNSB · 11-08-2023, 01:50 PM

Quote:

Originally Posted by binaryhermit

I believe someone got their hands on a copy of one of the Harry Potter books (IIRC the last one) before release, took pictures of every page using their phone, and there was an unofficial ebook before release day via crowdsourced manual OCR.

I seem to remember the camera used was a Canon DSLR and the original images had the camera serial number in the EXIF information. Oops.

Quote:

Originally Posted by binaryhermit

EDIT: I assume for ebooks of sufficiently old print editions (AKA old enough there's not a text file of the book from the production process) they do more or less exactly what Quoth described

A lot of the older books get converted to PDF where you are viewing the images of the pages with a hidden OCRred layer to allow for search. All too often when the PDF is converted to an epub, what you get is the abysmal OCRred text layer which in most cases is totally unreadable.