View Single Post
Old 01-27-2014, 08:08 AM   #514
Blossom
Treasure Seeker
Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.
 
Blossom's Avatar
 
Posts: 18,708
Karma: 26026435
Join Date: Mar 2010
Device: Kobo HD Glo, Kindles, Kindle Fires, Andriod Devices
So you remember that horrible Bobbi Smith topaz book Forbidden Fires I bought for 99 cents? After working on it all yesterday I now have a good readable copy and learned alot in the process.

First off it was a PITA to do but I feel like I've conquered the format now.

I first had to extract the SVG images, then use Notepad+ batch replace to remove the scripts from each page. I use this post as a guide only doing the first 3 steps. I use Acrobat 8 to merge the pdfs files then crop them. I then ran it through OCR software. I use Able2Extract (which isn't free but cheaper than ABBYY) I find it does a much better job interpreting letters and less weird characters than ABBYY but it takes about an hour to convert but it really worth it not having to fix every page for OCR errors and yet still have my italics.

Once that was done I ran the html file through PSPad cleaning up the styles using Tidy and then open it in Word and fixed the formatting and broken sentences using my macros and regex expressions. I then put the finish html file in Calibre and converted it to mobi and epub.

Now I got the method down I can do this in probably 3 hours if I don't run into problems.

So that's one topaz book down and several more to go!
Blossom is offline   Reply With Quote