View Single Post
Old 07-04-2012, 08:55 AM   #14
avantman42
Wizard
avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.
 
avantman42's Avatar
 
Posts: 1,090
Karma: 6058305
Join Date: Sep 2010
Location: UK
Device: Kindle Paperwhite
If you have access to pdftohtml & sed, I use the following:

Code:
pdftohtml -c -s -i -xml INPUT_FILE.pdf
sed -e s/"<[^>]*>"//g INPUT_FILE.xml > OUTPUT_FILE.txt
That usually gives a reasonable text file, which can then be worked on if needed and converted to whatever format you wish using ebook-convert.
avantman42 is offline   Reply With Quote