View Single Post
Old 02-12-2011, 12:28 PM   #43
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Here are my results:

Upload time

Accurate

14 books - 29 sec
5 books - 8 sec
3 books - 5 sec


Fast

14 books - 13 sec
5 books - 6 sec
3 books - 3 sec

Looking at set of 14 books we have approximately double the time to transfer. This is due to decompression and parsing the text. My parser only looks at the amount of visible text. Each page is comprised of 32 lines and each line can have up to 70 characters. A paragraph always starts a new line. The accuracy can be increased by adding support for handling <div class="mbp_pagebreak" /> and <br> tags.

One of my test books at page 105 mapped 1 to 1 to the print version. I did not do extensive testing of the other pages. Other books mapped very closely to the print edition.

Over all the more accurate APNX generator gives much closer results. With handling for the two additional elements OCRed texts and ones that use the same type setting as their print counter parts should give nearly 1 to 1 page mappings.

Now for the big question. Is doubling the time it takes to transfer the book to the device worth a more accurate mapping? The mapping will be thrown off if the print book physical dimensions are different than the average paper back size I'm using. So a hard cover, larger or smaller paper back will cause the mapping to be off.

Also, if it is worth the extra time for using the accurate parser would it be worth increasing the time even more by changing the parser to accommodate the two previously mentioned elements and make it even more accurate?
user_none is offline   Reply With Quote