View Single Post
Old 08-17-2010, 09:54 PM   #8
tomsem
Grand Sorcerer
tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.
 
Posts: 6,477
Karma: 26425959
Join Date: Apr 2009
Location: USA
Device: iPhone 15PM, Kindle Scribe, iPad mini 6, PocketBook InkPad Color 3
Quote:
Originally Posted by beacher View Post
I tried the software suggested, as well as another, seems as soon as it's converted from PDF to anything, there is double spacing in between characters, almost randomly.

Maybe it's only these couple ebooks I'm trying (maybe they are OCR scans)...but it sure stinks. If I don't find any software, I might attempt writing a script that will remove the spaces.
Could be, and as you suspect, it's dumping text that was generated with OCR from scanned images. Or it might be some form of copy protection (inserting random invisible whitespace to make conversion less attractive). Load it into Adobe Reader (free), try searching for some text and see if it can find things as expected, or see if copy/paste has the same issue.

Also, Adobe has a free PDF-to-text and PDF-to-HTML service. You might try it for another data point. I suspect the points will continue to all line up however.

Online conversion tools for Adobe PDF documents
tomsem is offline   Reply With Quote