Originally Posted by ldolse
Based on your original report I thought the hyphens were being retained in all cases. The behavior you're seeing is by design. Hyphens can't be universally eliminated, as some words/phrases are always supposed to be hyphenated. The only way to decide what hyphens can be safely removed is to use a dictionary. However users all over the world use Calibre in many languages, and most books also use proper names, made up words, or scientific words which won't appear in any dictionary. In order to work around all that Calibre uses the book itself as its' dictionary. Hyphens are only removed if the word appears in the book without a hyphen. So those cases where there is still a hyphen means that word didn't occur a second time in the book. (alla may be an exception, a side affect of some recent work on reducing false positives)
Understood. This is very smart modus operandi ...
BTW I used also your regular expression to remove the -<br> and the final work is perfect I have to say.
Thanks for the really appreciated esplanation.
I am reading also the faq on pdf :-)