View Single Post
Old 03-05-2012, 06:00 PM   #1
therealjoeblow
Zealot
therealjoeblow reads XML... blindfoldedtherealjoeblow reads XML... blindfoldedtherealjoeblow reads XML... blindfoldedtherealjoeblow reads XML... blindfoldedtherealjoeblow reads XML... blindfoldedtherealjoeblow reads XML... blindfoldedtherealjoeblow reads XML... blindfoldedtherealjoeblow reads XML... blindfoldedtherealjoeblow reads XML... blindfoldedtherealjoeblow reads XML... blindfoldedtherealjoeblow reads XML... blindfolded
 
Posts: 106
Karma: 52102
Join Date: Jun 2010
Device: Samsung Android Tablet w/Moon+ Pro Reader
Heuristic "Remove unnecessary hyphens" not working?

Does this feature actually work?

I converted a retail .PDF book into .htmlz, and then fixed all of the broken quotes and paragraphs using my standard regex searches, no issues there. However, in the .PDF of this particular book, the publisher fubar'ed it by using the same normal 'dash' character for end of line hyphenation as for compound words, so it's not simple to fix with over 1100 occurrences of the "-" character.

For example: "tight-lipped" is a proper compound word that should have the 'dash', but "im-mersed" was hyphenated at the end of a line in the original .PDF and should have the 'dash' removed.

I tried enabling the heuristics "remove unnecessary hyphens" option when I converted the .htmlz to .epub, hoping it would fix this, but it makes no difference, none of the dashes are removed during the conversion.

Any ideas?

Cheers
The REAL Joe
therealjoeblow is offline   Reply With Quote