MobileRead Forums - View Single Post

Coconut · 01-25-2010, 12:48 PM

Quote:

Originally Posted by KevinH

Hi Clarknova,

Would it be any help (pdf file size-wise) to start with the html version of the book with only critical areas converted to svg's but the main part of the book being straight html.

For example, a new version of flatxml2html using the code used for the ornate letter A issue can automatically create svg images for just the "fixed" regions on the page and put img src style links to them right into the html while letting the bulk of the document remain html. This did wonders for the need to hand edit anything in my book but at the expense of more svg images and less ability to search for things (since they might be in images).

The question is would this result in a significantly reduced in size pdf (once converted)? Or would this buy us nothing?

Thanks,

KevinH

It's probably even simpler than that. The html outputted at the end of the conversion process maintains information on pagination. It should be fairly straightforward to transform the html so that actual page breaks occur in the right places, as well as images.

First I'm going to take a look at the PDF's we can produce, and then move on from there. There's nothing to stop me from feeding that PDF back through OCR and output a text-based PDF with images.

Edit:
using ubuntu, I installed the librsvg2-bin package, which I used for conversion. The commandline I used -- in svg directory -- was "for i in page*.svg; do rsvg-convert -a -f pdf $i -o `echo $i | sed -e ' s/svg$/pdf/'`; done"

This created individual pdf's for each page. A total of 305 pages, at 197 megabytes. I combined those using Acrobat, and then ran 'optimize for OCR'. The resulting file is beautiful, with all images, and smooth, and weighs in at 3407K. Awesome.