MobileRead Forums - View Single Post - pdf with split words at end of line

ldolse · 10-17-2010, 09:16 AM

Actually Calibre does go through and remove hyphenated words intelligently. It uses the document itself as a dictionary to see if there is a variant of the word without a hyphen, and deletes the hyphen if there is a match.

The problem in this case is it's a crappy pdf with some other character encoded in addition to the hyphen. Unless this is a common issue across many pdfs (and I've never seen it with lots of test cases), it's probably not something that will get covered in the code.

10-17-2010, 09:16 AM	#3
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	Actually Calibre does go through and remove hyphenated words intelligently. It uses the document itself as a dictionary to see if there is a variant of the word without a hyphen, and deletes the hyphen if there is a match. The problem in this case is it's a crappy pdf with some other character encoded in addition to the hyphen. Unless this is a common issue across many pdfs (and I've never seen it with lots of test cases), it's probably not something that will get covered in the code.