View Single Post
Old 08-28-2011, 02:12 AM   #4
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
The 'converter' consists of two components - one is a third party utility, Poppler's pdftohtml. Pdftohtml creates some basic html markup which is essentially completely unreadable as an ebook - enable the debug output to see what Poppler's raw html looks like. The links are generated by this code - all they do is link to a page number rather than the actual chapter heading.

Calibre then does a considerable amount of massaging of that html code to make something readable as an ebook. One of these steps is fixing paragraphs/sentences across page breaks and line breaks. Retaining the hyperlinks would break this algorithm (since every page has a hyperlink whether it's used by a TOC or not), so it's either retain your links or have broken sentences... Fixing broken sentences won out.

There's always room for improvement, and patches are welcome. But odds are it won't happen with the current version of the pdf conversion code.
ldolse is offline   Reply With Quote