MobileRead Forums - View Single Post

miquel · 05-26-2010, 05:18 PM

OK Kovid, I'd like to confirm a couple of things with you, please
The new pdf engine:

1. Takes the pdf file, and passes it to the C plugin implementation of PDF reflow. That returns an xml with the pdf's draw commands (a pdf in xml if you will)

2. PDFDocument takes the xml and generates the html that's used as a base for conversion

3. The rest of ebook conversion takes the html into whatever other format is needed

My plan would then be to hack into PDFDocument, take the xml, do the unwrapping and header+footer detection, and end up making the html there.

Is that what you had in mind? Or did you intend the reflow plugin to, you know, reflow (ie unwrap) the pdf? I personally prefer pdfreflow being a pdf-to-xml-that-we-can-work-on-in-python converter

Did I get it right? What did you have in mind?
Thanks!

05-26-2010, 05:18 PM	#15
miquel Junior Member Posts: 7 Karma: 10 Join Date: May 2010 Location: Heidelberg, Germany Device: Amazon Kindle 2	OK Kovid, I'd like to confirm a couple of things with you, please The new pdf engine: 1. Takes the pdf file, and passes it to the C plugin implementation of PDF reflow. That returns an xml with the pdf's draw commands (a pdf in xml if you will) 2. PDFDocument takes the xml and generates the html that's used as a base for conversion 3. The rest of ebook conversion takes the html into whatever other format is needed My plan would then be to hack into PDFDocument, take the xml, do the unwrapping and header+footer detection, and end up making the html there. Is that what you had in mind? Or did you intend the reflow plugin to, you know, reflow (ie unwrap) the pdf? I personally prefer pdfreflow being a pdf-to-xml-that-we-can-work-on-in-python converter Did I get it right? What did you have in mind? Thanks!