View Single Post
Old 06-04-2009, 03:08 AM   #1
=X=
Wizard
=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.
 
=X='s Avatar
 
Posts: 3,671
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
HOWTO: Improve performance on calibe generated ePUBs

Hi All,
NOTE1: Perl script with FIX is now added
NOTE2: Added executable! Thank you nrapallo!

Note: I decided to make my post #517 its on thread here in the SONY section

I've found the TOC on ePUB generated by calibre to be intolerable. An ePUB with forty TOC entry can take up to 90sec.

Below is what I've found


TOC with "#HREF" syntax makes opening the ePUB extremely slow. With large enough TOC files this will take a long time or even cause the reader to crash.


PROBLEM:
I've noticed a big performance hit every time I try to open up an ePUB book and use the TOC. You mentioned on a different thread it was due to the #HERF.

TEST:
Okay I've done a few test to see how true this is and if there is a good solution to resolve this.

Attached is 3 files
Test File.epub (unmodified calibre generated TOC)
Test File_NOREF.epub (ALL #HREF removed from all URL in the toc.ncx file)
Test File_noREF_Capter.epub (Only the top level chapters have the #HREF removed, sub chapters have the #HREF)

Measured time to the TOC from an ePUB book created from calibre.
  • Test File.epub
    : 110 sec (1min 50 sec)
  • Test File_NOREF.epub
    (Instant)
  • Test File_noREF_Capter.epub
    : Instant for top level chapters. Sub chapters varied depending on how many sub elements it had. The last chapter had 40 items and took 1.5 sec


SOLUTION
There is a HUGE performance increase by just removing the the #HREF URL path from top level TOC. While there still is a hit on sub toc they are small and tolerable.

To do this unzip the epub. Open the toc.ncx XML file.

Go to the docTitle section
Then move to the childe node titled docTitle/navPoint/content XPath
<docTitle>
<navPoint>
<content src="URL">

Remove the #HREF portion located in the URL text of the content node. (i.e. at the end of the URL there is something "http://....#calibre_..." Remove everything from the hash (#) to the end of the URL.

This only has to be done for the top level navPoints to increase the performance.

Have Fun,

=X=
Attached Files
File Type: epub Test File.epub (30.5 KB, 415 views)
File Type: zip Test File_NOREF.epub.zip (30.2 KB, 431 views)
File Type: zip Test File_noREF_Capter.epub.zip (30.4 KB, 413 views)
File Type: zip ePub_TOC_enhancer_v0.01.zip (1.6 KB, 419 views)
File Type: zip ePub_TOC_enhancer_v0.01.exe.zip (1.54 MB, 454 views)

Last edited by =X=; 06-05-2009 at 12:47 PM. Reason: Added Note1 and script to fix the TOC
=X= is offline   Reply With Quote