MobileRead Forums - View Single Post - PDF 2 EPUB

user_none · 09-04-2010, 10:02 AM

Quote:

Originally Posted by GrzegorzN

Oh yeah, I've noticed the exact same issue with some PDFs in my collection. It seems that every accented Polish letter is in fact composed of 2 (or more) characters, layered on top of another (is this internally a ligature? probably not...)

It's not a ligature it's your fist idea. The PDF stores the character as two characters it draws over one another.

Quote:

Originally Posted by GrzegorzN

According to TeX the way to typeset each 'accented' glyph (and there's a huge number of them) might differ between fonts, but maybe in practice it's not that bad. I see for example that your PDFs are mapping 'ą' in exactly the same way as mine -- using a string that includes two linebreaks. So there might be a common pattern, and if that's the case, it might be possible to create a reverse mapping table. There might even be some industry standard that describes mappings like 'oacute -> ´o'

I don't think Calibre supports anything like that at the moment though (?)

PDF input uses a character mapping for the parents issue. However, German, Spanish and French characters are all I added support for because that's what I had test books for and am somewhat familiar with. I can easily add other characters to the mapping.

I just need the lowercase and upper case unicode character and the two characters that the PDF uses to represent them.