View Single Post
Old 01-24-2012, 04:34 AM   #2
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
I'll just copy-paste this because I'm tired of explaining it every single day, in some form or another:

"PDF is the worst possible format to convert FROM. It was designed as an output format. This subject has been beaten to death around here because a lot of PDFs aren't tagged PDFs - meaning that letters (and a lot of times small groups of letters) resemble something like floating objects on a blank paper, each with their own coordinates and extra baggage. So it's very difficult to get a 1:1 conversion. A lot of formatting will be lost, some will get interpreted wrong, etc..."


Adobe Reader (which is free) can export to .txt but you'll lose a lot of formatting (italics, bolds, etc) and it's not guaranteed that you won't get misplaced paragraphs at the end of the document or paragraphs in a different order. It's always better to go back to the original source (the initial .rtf, .doc, .docx, .odt, etc file) and go from there using OpenOffice/LibreOffice, Atlantis, Word and so on.

Or you could re-OCR the PDF with ABBYY FineReader and go from there.
DSpider is offline   Reply With Quote