View Single Post
Old 06-09-2010, 01:36 AM   #9
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Hmm - I just went through some of the xhtml with the SVG data. For some of the content - title pages, contents, copyrights, etc - it would make a lot of sense to use this instead of the OCR. It looks like it might be as simple as dropping the xhtml files from the script into the appropriate points of the book using Sigil... Will give it a go.

There is some javascript stuff there for zooming in/out and changing pages - I'm not sure if this got added by the script or if the original content used it. Anyway I think the readers ignore javascript if I recall. The other speedbump is that a fair number of the xhtml files don't contain anything of value, at least in the one book I looked at.

It does seem like an ideal option in the long run would be to provide an option for two different types of epubs, one that bases the output on the OCR'd text, and another that bases it off the SVG output. Really curious if this sort of SVG content will wind up being fully compatible with the various epub renderers.

The info by the original developer was good - I'd already read his blog post, but I didn't see him participating in the other discussion before. I don't quite get some of the comments regarding dealing with layout, as these scripts do a great job of extracting images and putting them in the right places with html, and I also don't really understand how topaz is working from reflow perspective - reading it on an iphone or a kindle you wouldn't have any idea that the native format/view for this data is the original scanned page.

Last edited by ldolse; 06-09-2010 at 01:47 AM.
ldolse is offline   Reply With Quote