MobileRead Forums - View Single Post

heddhunter · 01-18-2011, 02:18 PM

Well, I've come up with a solution that works for my particular situation but I'm pretty sure it is not going to work for any random pdf. I wrote a perl script that takes pdftohtml's XML output and rewrites it into HTML. The XML is fairly easy to clean up. There are a few simple rules I use to detect paragraph breaks. I load the HTML output into Calibre and then let Calibre do its normal conversion stuff to get the final book onto my Kindle. It's not a simple drag n drop procedure though. And, as I say, I don't think it will work generically.

I guess if people have specific pdf's they want me to take a look at I could do that and see if there's a way to make the conversion procedure somewhat simpler.

01-18-2011, 02:18 PM	#7
heddhunter Junior Member Posts: 4 Karma: 10 Join Date: Jan 2011 Device: Kindle 3	Well, I've come up with a solution that works for my particular situation but I'm pretty sure it is not going to work for any random pdf. I wrote a perl script that takes pdftohtml's XML output and rewrites it into HTML. The XML is fairly easy to clean up. There are a few simple rules I use to detect paragraph breaks. I load the HTML output into Calibre and then let Calibre do its normal conversion stuff to get the final book onto my Kindle. It's not a simple drag n drop procedure though. And, as I say, I don't think it will work generically. I guess if people have specific pdf's they want me to take a look at I could do that and see if there's a way to make the conversion procedure somewhat simpler.