View Single Post
Old 10-30-2009, 02:58 PM   #10
charleski
Wizard
charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.
 
Posts: 1,186
Karma: 604284
Join Date: Sep 2009
Device: PRS-505
Chang: going by the output you provide I wouldn't bother trying to get SomePDF to work properly. If it can't even handle hyphens correctly it's not worth using.

If you're looking for a free program, have you tried Mobipocket creator? You can use that to convert a PDF to html, and from some brief tests it seems that it respects tags reasonably well. Tagged paragraphs that are not separated with a blank line are simply given a break tag at the end, which shows up as a manual line-break in Word, but a simple search-replace is all that's needed to convert those back into paragraph marks. It also doesn't get confused by hypens (as long as they're soft hyphens, which any decent PDF-creation program should use for words that are split at line breaks).

I wouldn't worry about ragged line-ends such as the ones you show in notepad-sample.gif. You're creating reflowable text and the reader will handle the line lengths when it lays out the eBook.
As I said before, a lot depends on whether the PDF was properly tagged when it was initially created. If it wasn't, then there's no magic program to help and nothing for it but to go through the text and correct it by hand.
charleski is offline   Reply With Quote