MobileRead Forums - View Single Post

CazMar · 11-16-2010, 06:31 PM

I know what you mean - but I think the problem is with the PDF file, not the converter. Some scanning is done as image files, other PDF files are produced by converting a text document (ie Word or RTF etc). If someone is scanning an old book then what they produce are image files of each page. If you use OCR (Optical Character Recognition) software you may be able to extract the text, BUT, if the original document was old, badly marked, had underlining and notes scribbled in the margin, the poor old OCR software is going to have a bad time! A lot of Google books have been converted this way and the epub files can be a bit strange - lots of gobbledygook (or should that be googlygook?)

11-16-2010, 06:31 PM	#4
CazMar Book Geek Posts: 596 Karma: 1499085 Join Date: Aug 2010 Location: Adelaide, Australia Device: Kobo Touch, Asus MemPad 7" tablet, Nexus 5, Asus 10" tablet	I know what you mean - but I think the problem is with the PDF file, not the converter. Some scanning is done as image files, other PDF files are produced by converting a text document (ie Word or RTF etc). If someone is scanning an old book then what they produce are image files of each page. If you use OCR (Optical Character Recognition) software you may be able to extract the text, BUT, if the original document was old, badly marked, had underlining and notes scribbled in the margin, the poor old OCR software is going to have a bad time! A lot of Google books have been converted this way and the epub files can be a bit strange - lots of gobbledygook (or should that be googlygook?)