View Single Post
Old 10-17-2010, 10:16 AM   #3
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Actually Calibre does go through and remove hyphenated words intelligently. It uses the document itself as a dictionary to see if there is a variant of the word without a hyphen, and deletes the hyphen if there is a match.

The problem in this case is it's a crappy pdf with some other character encoded in addition to the hyphen. Unless this is a common issue across many pdfs (and I've never seen it with lots of test cases), it's probably not something that will get covered in the code.
ldolse is offline   Reply With Quote