lyj898
05-26-2009, 12:42 PM
anybody knows how to convert a pdf file into txt with the proper formating. The software that i have messed up all the words with empty spaces strewn all over. Most annoying thing is the page number that cuts in between sentences!:angry:
JSWolf
05-27-2009, 08:51 AM
I hate to say this, but converting from PDF will give you errors. It'as the nature of the beast. That's part of why PDF is a very poor source format.
Not sure what OS you are using but if you have windows use
MobiCreator + Firefox/IE to save the HTML as text
1) Import PDF to MobiCreator. This will generate an HTML file
2) Open the HTML file in FireFox and Save As... to a text file.
If you have Linux use pdf2xml with cxpdfhtml for step one.
=X=