View Single Post
Old 09-04-2010, 10:02 AM   #6
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by GrzegorzN View Post
Oh yeah, I've noticed the exact same issue with some PDFs in my collection. It seems that every accented Polish letter is in fact composed of 2 (or more) characters, layered on top of another (is this internally a ligature? probably not...)
It's not a ligature it's your fist idea. The PDF stores the character as two characters it draws over one another.

Quote:
Originally Posted by GrzegorzN View Post
According to TeX the way to typeset each 'accented' glyph (and there's a huge number of them) might differ between fonts, but maybe in practice it's not that bad. I see for example that your PDFs are mapping 'ą' in exactly the same way as mine -- using a string that includes two linebreaks. So there might be a common pattern, and if that's the case, it might be possible to create a reverse mapping table. There might even be some industry standard that describes mappings like 'oacute -> Žo'

I don't think Calibre supports anything like that at the moment though (?)
PDF input uses a character mapping for the parents issue. However, German, Spanish and French characters are all I added support for because that's what I had test books for and am somewhat familiar with. I can easily add other characters to the mapping.

I just need the lowercase and upper case unicode character and the two characters that the PDF uses to represent them.
user_none is offline   Reply With Quote