MobileRead Forums - View Single Post - No double LL's-PDF to EPUB

tomsem · 08-09-2010, 03:43 PM

I gather you are using calibre for the conversion.

For reasons I don't understand, some double 'l' sequences in some PDFs are converted successfully, and some are not. In each case where I've investigated the PDF source, they are not encoded as ligatures, and nevertheless many 'll' sequences are converted to 'l ' in the HTML. So it is not about ligatures, I'm not surprised that option has no effect.

I haven't seen a problem with other letter combinations, but it's entirely possible that such problems exist.

I can only think that it's a bug in the (open source?) PDF converter code that calibre uses: if I export a 'problem' PDF to HTML from Acrobat Professional, the HTML has the correct sequence of characters, but it is deficient in other respects.

My attempts to reproduce a problem by creating my own PDFs and converting with calibre have been unsuccessful.

It is weird, and frustrating, and I'm not sure there are better one click PDF conversion options available at any reasonable price. You can always clean up the results, of course, but that often involves more time than I'm willing to invest.

08-09-2010, 03:43 PM	#3
tomsem Grand Sorcerer Posts: 7,186 Karma: 28000007 Join Date: Apr 2009 Location: USA Device: iPad Mini, Kindle Scribe Colorsoft	I gather you are using calibre for the conversion. For reasons I don't understand, some double 'l' sequences in some PDFs are converted successfully, and some are not. In each case where I've investigated the PDF source, they are not encoded as ligatures, and nevertheless many 'll' sequences are converted to 'l ' in the HTML. So it is not about ligatures, I'm not surprised that option has no effect. I haven't seen a problem with other letter combinations, but it's entirely possible that such problems exist. I can only think that it's a bug in the (open source?) PDF converter code that calibre uses: if I export a 'problem' PDF to HTML from Acrobat Professional, the HTML has the correct sequence of characters, but it is deficient in other respects. My attempts to reproduce a problem by creating my own PDFs and converting with calibre have been unsuccessful. It is weird, and frustrating, and I'm not sure there are better one click PDF conversion options available at any reasonable price. You can always clean up the results, of course, but that often involves more time than I'm willing to invest.