MobileRead Forums - View Single Post - Conversion from two-column PDF

avantman42 · 07-04-2012, 08:55 AM

If you have access to pdftohtml & sed, I use the following:

Code:

pdftohtml -c -s -i -xml INPUT_FILE.pdf
sed -e s/"<[^>]*>"//g INPUT_FILE.xml > OUTPUT_FILE.txt

That usually gives a reasonable text file, which can then be worked on if needed and converted to whatever format you wish using ebook-convert.

07-04-2012, 08:55 AM	#14
avantman42 Wizard Posts: 1,090 Karma: 6058305 Join Date: Sep 2010 Location: UK Device: Kindle Paperwhite	If you have access to pdftohtml & sed, I use the following: Code: pdftohtml -c -s -i -xml INPUT_FILE.pdf sed -e s/"<[^>]*>"//g INPUT_FILE.xml > OUTPUT_FILE.txt That usually gives a reasonable text file, which can then be worked on if needed and converted to whatever format you wish using ebook-convert.