Quote:
Originally Posted by HappyChris
I have downloaded a free pdf book and wish to convert it to be read via my Kindle or my epub readers. I used Calibre to convert it to AZW3 and HTMLZ, both of which display similarly, that is, with hard-coded line lengths and page headers, numbers etc., displaying clumsy line formatting on small 'pages' with paragraph, not line- spacing between lines and chapter titles, etc. inappropriate to the page size.
I tried one or two online services but, while they got the ToC lines down to one line each with fewer dots between the chapter title and the page number, they choked when faced with the many diacritics in the text! Calibre handled the diacritics without fault.
I would rather not spend days removing '</p>'s and repeating page header and footer texts! Is there a program or a script that could speed this process up or even do it all for me?
|
Sorry Chris:
My business does this every single day, and there's no magic, fast way. The only way to get there from here is to a) use Abbyy Fine Reader, which at least will remove the running heads/footers, and b) do the rest by hand. I wish there were a faster/better way, but there isn't.
Having said that, yes, you can write regex to clean up some portion of the line-ending </p>'s, and all that, but...it's still all human labor, eyes and hands.
Hitch