View Full Version : pdf to htxt

05-26-2009, 12:42 PM
anybody knows how to convert a pdf file into txt with the proper formating. The software that i have messed up all the words with empty spaces strewn all over. Most annoying thing is the page number that cuts in between sentences!:angry:

05-27-2009, 08:51 AM
I hate to say this, but converting from PDF will give you errors. It'as the nature of the beast. That's part of why PDF is a very poor source format.

05-27-2009, 12:32 PM
Not sure what OS you are using but if you have windows use
MobiCreator + Firefox/IE to save the HTML as text

1) Import PDF to MobiCreator. This will generate an HTML file
2) Open the HTML file in FireFox and Save As... to a text file.

If you have Linux use pdf2xml with cxpdfhtml for step one.