Originally Posted by cybmole
on my Kindle, all the genuine hyphenated words appear like this "xxxxx-xxxxx", all the faulty ones are like this "xxxxx- xxxx" i.e. only the faulty ones have a space after the hyphen, so maybe an auto-fix IS possible ?
UPDATE _ i think I may have fixed it - I converted .mobi to .rtf & began a [ find "- " replace with null] process in Word , after doing a few manually it seemed to be finding only correct items to fix so I fired off replace all which did 1100+ changes. I'll convert back into .mobi now & see how it goes - well it improved the text , I think.
but a regex solution would maybe be better, I've preserved an unchanged epub version for possible further experimentation.
I see also that in the epub and mobi conversions some pictures are messed up - this is probably a epub format limitation. - the original PDF contains charts that seem to be made of 6 or 7 panels appended together horizontally.
the convertsion process has separated those into vertical stacks of picture slices. I guess I'll have to read the pdf to see those correctly.
Just use the remove header/footer regex option to delete the hyphens then.
That is a different unicode code point than the hyphen that typically occurs in most documents. I'll look into adding that to the default de-hyphenation regex.