View Single Post
Old 01-30-2020, 04:01 PM   #14
hobnail
Running with scissors
hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.
 
Posts: 1,591
Karma: 14328510
Join Date: Nov 2019
Device: none
Quote:
Originally Posted by FrustratedReader View Post
Archive.org is terrible. Usually no proofing. I don't bother downloading epub/mobi if it's only their own OCR of the pdf. If it's from Microsoft or Google Books scan, then PDF is best.
I don't know if everyone else knows this but PDFs can have this clever thing (to me anyways) where there are 2 layers. The visible layer is what you see when you open it in a PDF viewer or whatever, and the invisible layer is the OCR'd text. You can tell if it has the OCR'd text if you click and drag your mouse over the text and it selects stuff, probably not exactly lined up with the visible text. If your PDF viewer supports it you can open the PDF and then do a Save As and select text as the output format and save the OCR'd text. It's very likely the exact same text that you get when you download the .txt file from archive.org but sometimes it can be helpful to access that text as you're looking at the scanned image of the page.
hobnail is offline   Reply With Quote