Quote:
Originally Posted by GrzegorzN
Oh yeah, I've noticed the exact same issue with some PDFs in my collection. It seems that every accented Polish letter is in fact composed of 2 (or more) characters, layered on top of another (is this internally a ligature? probably not...)
|
It's not a ligature it's your fist idea. The PDF stores the character as two characters it draws over one another.
Quote:
Originally Posted by GrzegorzN
According to TeX the way to typeset each 'accented' glyph (and there's a huge number of them) might differ between fonts, but maybe in practice it's not that bad. I see for example that your PDFs are mapping 'ą' in exactly the same way as mine -- using a string that includes two linebreaks. So there might be a common pattern, and if that's the case, it might be possible to create a reverse mapping table. There might even be some industry standard that describes mappings like 'oacute -> Žo'
I don't think Calibre supports anything like that at the moment though (?)
|
PDF input uses a character mapping for the parents issue. However, German, Spanish and French characters are all I added support for because that's what I had test books for and am somewhat familiar with. I can easily add other characters to the mapping.
I just need the lowercase and upper case unicode character and the two characters that the PDF uses to represent them.