I have some RPG PDFs that I'd like to be able to read on my Kindle. Converting them is a real pain, because the text is in two columns. After some experimentation, I've found the following set of commands, which appear to produce a plain text file with the text in the correct order:
Code:
pdftohtml -c -s -i -xml INPUT_FILE.pdf
sed -e s/"<[^>]*>"//g INPUT_FILE.xml > OUTPUT_FILE.txt
I've only tested this on a few files, but it worked quite well on those.
The text will probably need some cleaning up, and of course will contain no formatting, but I found that Calibre was quite intelligent at working out where headings were when converting the text file to a .mobi.