The pdfreflow.so module is a C module that takes a PDF and returns an XML. The XML is not quite PDF draw commands (the C code does a little bit of cleanup/consolidation).
The calibre.ebooks.pdf.reflow python module then takes that XML file and tries to "reflow" it (i.e. do things like unwrap analysis, identifying structure and so on).
So the best place for you to do hacking in in calibre.ebooks.pdf.reflow
|