What I found very useful was using LibreOffice Draw to convert PDF to a flat ODG document, which can be easily converted to a markup language like XML.
This is the only sure way to work around paragraphs, fonts, and page breaks. Automatic tools often fail with those.
|