Quote:
Originally Posted by beacher
I tried the software suggested, as well as another, seems as soon as it's converted from PDF to anything, there is double spacing in between characters, almost randomly.
Maybe it's only these couple ebooks I'm trying (maybe they are OCR scans)...but it sure stinks. If I don't find any software, I might attempt writing a script that will remove the spaces.
|
Could be, and as you suspect, it's dumping text that was generated with OCR from scanned images. Or it might be some form of copy protection (inserting random invisible whitespace to make conversion less attractive). Load it into Adobe Reader (free), try searching for some text and see if it can find things as expected, or see if copy/paste has the same issue.
Also, Adobe has a free PDF-to-text and PDF-to-HTML service. You might try it for another data point. I suspect the points will continue to all line up however.
Online conversion tools for Adobe PDF documents