View Single Post
Old 01-28-2014, 04:20 PM   #531
Difflugia
Testate Amoeba
Difflugia ought to be getting tired of karma fortunes by now.Difflugia ought to be getting tired of karma fortunes by now.Difflugia ought to be getting tired of karma fortunes by now.Difflugia ought to be getting tired of karma fortunes by now.Difflugia ought to be getting tired of karma fortunes by now.Difflugia ought to be getting tired of karma fortunes by now.Difflugia ought to be getting tired of karma fortunes by now.Difflugia ought to be getting tired of karma fortunes by now.Difflugia ought to be getting tired of karma fortunes by now.Difflugia ought to be getting tired of karma fortunes by now.Difflugia ought to be getting tired of karma fortunes by now.
 
Difflugia's Avatar
 
Posts: 3,049
Karma: 27300000
Join Date: Sep 2012
Device: Many Android devices, Kindle 2, Toshiba e755 PocketPC
Quote:
Originally Posted by Blossom View Post
So you remember that horrible Bobbi Smith topaz book Forbidden Fires I bought for 99 cents? After working on it all yesterday I now have a good readable copy and learned alot in the process.

First off it was a PITA to do but I feel like I've conquered the format now.

I first had to extract the SVG images, then use Notepad+ batch replace to remove the scripts from each page. I use this post as a guide only doing the first 3 steps. I use Acrobat 8 to merge the pdfs files then crop them. I then ran it through OCR software. I use Able2Extract (which isn't free but cheaper than ABBYY) I find it does a much better job interpreting letters and less weird characters than ABBYY but it takes about an hour to convert but it really worth it not having to fix every page for OCR errors and yet still have my italics.

Once that was done I ran the html file through PSPad cleaning up the styles using Tidy and then open it in Word and fixed the formatting and broken sentences using my macros and regex expressions. I then put the finish html file in Calibre and converted it to mobi and epub.

Now I got the method down I can do this in probably 3 hours if I don't run into problems.

So that's one topaz book down and several more to go!
I started with the same tutorial and ended up using Inkscape instead of Prince. I didn't bother mentioning it in that thread because it was old and I didn't think anyone else was still tinkering with Topaz files.

Anyway, if you remove everything from each page except the part that starts with <svg id="svgimg"... and ends with </svg> then Inkscape will convert it to a PDF that uses the glyphs, so it's almost an order of magnitude smaller than the one created by Prince. Inkscape is also scriptable, so most of the process can run unattended. It takes about an hour to run a 200-page book.

The only problem is that Inkscape calculates the page size from the dimensions of the SVG image and I haven't figured out how to tell it otherwise. The PDF looks fine on the screen, but it reports its size as being 60 by 90 inches.
Difflugia is offline   Reply With Quote