View Single Post
Old 10-17-2010, 11:43 AM   #5
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by sherman View Post
The main problem is that while many of the end of line hyphens are there to break up words to improve the typography of the book, some will be genuinely hyphenated words that should remain so.

And there probably isn't an automated way of determining this during conversion.
on my Kindle, all the genuine hyphenated words appear like this "xxxxx-xxxxx", all the faulty ones are like this "xxxxx- xxxx" i.e. only the faulty ones have a space after the hyphen, so maybe an auto-fix IS possible ?

UPDATE _ i think I may have fixed it - I converted .mobi to .rtf & began a [ find "- " replace with null] process in Word , after doing a few manually it seemed to be finding only correct items to fix so I fired off replace all which did 1100+ changes. I'll convert back into .mobi now & see how it goes - well it improved the text , I think.

but a regex solution would maybe be better, I've preserved an unchanged epub version for possible further experimentation.

I see also that in the epub and mobi conversions some pictures are messed up - this is probably a epub format limitation. - the original PDF contains charts that seem to be made of 6 or 7 panels appended together horizontally.
the convertsion process has separated those into vertical stacks of picture slices. I guess I'll have to read the pdf to see those correctly.

Last edited by cybmole; 10-17-2010 at 12:04 PM.
cybmole is offline   Reply With Quote