Chang: going by the output you provide I wouldn't bother trying to get SomePDF to work properly. If it can't even handle hyphens correctly it's not worth using.
If you're looking for a free program, have you tried Mobipocket creator? You can use that to convert a PDF to html, and from some brief tests it seems that it respects tags reasonably well. Tagged paragraphs that are not separated with a blank line are simply given a break tag at the end, which shows up as a manual line-break in Word, but a simple search-replace is all that's needed to convert those back into paragraph marks. It also doesn't get confused by hypens (as long as they're soft hyphens, which any decent PDF-creation program should use for words that are split at line breaks).
I wouldn't worry about ragged line-ends such as the ones you show in notepad-sample.gif. You're creating reflowable text and the reader will handle the line lengths when it lays out the eBook.
As I said before, a lot depends on whether the PDF was properly tagged when it was initially created. If it wasn't, then there's no magic program to help and nothing for it but to go through the text and correct it by hand.
|