View Single Post
Old 12-05-2017, 10:18 AM   #5
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,461
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by HappyChris View Post
I have downloaded a free pdf book and wish to convert it to be read via my Kindle or my epub readers. I used Calibre to convert it to AZW3 and HTMLZ, both of which display similarly, that is, with hard-coded line lengths and page headers, numbers etc., displaying clumsy line formatting on small 'pages' with paragraph, not line- spacing between lines and chapter titles, etc. inappropriate to the page size.

I tried one or two online services but, while they got the ToC lines down to one line each with fewer dots between the chapter title and the page number, they choked when faced with the many diacritics in the text! Calibre handled the diacritics without fault.

I would rather not spend days removing '</p>'s and repeating page header and footer texts! Is there a program or a script that could speed this process up or even do it all for me?

Sorry Chris:

My business does this every single day, and there's no magic, fast way. The only way to get there from here is to a) use Abbyy Fine Reader, which at least will remove the running heads/footers, and b) do the rest by hand. I wish there were a faster/better way, but there isn't.

Having said that, yes, you can write regex to clean up some portion of the line-ending </p>'s, and all that, but...it's still all human labor, eyes and hands.


Hitch
Hitch is offline   Reply With Quote