MobileRead Forums - View Single Post - Heuristic "Remove unnecessary hyphens" not working?

therealjoeblow · 03-05-2012, 07:00 PM

Does this feature actually work?

I converted a retail .PDF book into .htmlz, and then fixed all of the broken quotes and paragraphs using my standard regex searches, no issues there. However, in the .PDF of this particular book, the publisher fubar'ed it by using the same normal 'dash' character for end of line hyphenation as for compound words, so it's not simple to fix with over 1100 occurrences of the "-" character.

For example: "tight-lipped" is a proper compound word that should have the 'dash', but "im-mersed" was hyphenated at the end of a line in the original .PDF and should have the 'dash' removed.

I tried enabling the heuristics "remove unnecessary hyphens" option when I converted the .htmlz to .epub, hoping it would fix this, but it makes no difference, none of the dashes are removed during the conversion.

Any ideas?

Cheers
The REAL Joe

03-05-2012, 07:00 PM	#1
therealjoeblow Zealot Posts: 106 Karma: 52102 Join Date: Jun 2010 Device: Samsung Android Tablet w/Moon+ Pro Reader	Heuristic "Remove unnecessary hyphens" not working? Does this feature actually work? I converted a retail .PDF book into .htmlz, and then fixed all of the broken quotes and paragraphs using my standard regex searches, no issues there. However, in the .PDF of this particular book, the publisher fubar'ed it by using the same normal 'dash' character for end of line hyphenation as for compound words, so it's not simple to fix with over 1100 occurrences of the "-" character. For example: "tight-lipped" is a proper compound word that should have the 'dash', but "im-mersed" was hyphenated at the end of a line in the original .PDF and should have the 'dash' removed. I tried enabling the heuristics "remove unnecessary hyphens" option when I converted the .htmlz to .epub, hoping it would fix this, but it makes no difference, none of the dashes are removed during the conversion. Any ideas? Cheers The REAL Joe