View Single Post
Old 08-09-2010, 03:43 PM   #3
tomsem
Grand Sorcerer
tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.
 
Posts: 6,905
Karma: 27013865
Join Date: Apr 2009
Location: USA
Device: iPhone 15PM, Kindle Scribe, iPad mini 6, PocketBook InkPad Color 3
I gather you are using calibre for the conversion.

For reasons I don't understand, some double 'l' sequences in some PDFs are converted successfully, and some are not. In each case where I've investigated the PDF source, they are not encoded as ligatures, and nevertheless many 'll' sequences are converted to 'l ' in the HTML. So it is not about ligatures, I'm not surprised that option has no effect.

I haven't seen a problem with other letter combinations, but it's entirely possible that such problems exist.

I can only think that it's a bug in the (open source?) PDF converter code that calibre uses: if I export a 'problem' PDF to HTML from Acrobat Professional, the HTML has the correct sequence of characters, but it is deficient in other respects.

My attempts to reproduce a problem by creating my own PDFs and converting with calibre have been unsuccessful.

It is weird, and frustrating, and I'm not sure there are better one click PDF conversion options available at any reasonable price. You can always clean up the results, of course, but that often involves more time than I'm willing to invest.
tomsem is offline   Reply With Quote