View Full Version : pdf to htxt


lyj898
05-26-2009, 11:42 AM
anybody knows how to convert a pdf file into txt with the proper formating. The software that i have messed up all the words with empty spaces strewn all over. Most annoying thing is the page number that cuts in between sentences!:angry:

JSWolf
05-27-2009, 07:51 AM
I hate to say this, but converting from PDF will give you errors. It'as the nature of the beast. That's part of why PDF is a very poor source format.

=X=
05-27-2009, 11:32 AM
Not sure what OS you are using but if you have windows use
MobiCreator + Firefox/IE to save the HTML as text

1) Import PDF to MobiCreator. This will generate an HTML file
2) Open the HTML file in FireFox and Save As... to a text file.

If you have Linux use pdf2xml with cxpdfhtml for step one.

=X=