View Single Post
Old 02-25-2012, 06:11 AM   #3
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
Not necessarily. If it was saved as a tagged PDF (which can also reflow, btw - just like ePub or Mobi), you could extract the content as XML with, say, Adobe Acrobat (perhaps Reader too? idk) and work your way up from there. But yeah, it will require some manual touch-up. If you save it as plain txt you'll lose formatting.

Keep in mind that the vast majority of PDF files are not tagged PDF's. They have individual coordinates for individual letters or groups of letters, which make them very difficult to convert. They're like objects on a blank piece of paper and the programs have to interpret and approximate, which impacts quality and accuracy. You can easily spot them by simply selecting the text (Ctrl+A). If it's all segmented, then it's not a tagged PDF.

This subject has been beaten to death around here, I'm not going to go any further. Search the forums.
DSpider is offline   Reply With Quote