![]() |
#1 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,671
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
|
HOWTO: Improve performance on calibe generated ePUBs
Hi All,
NOTE1: Perl script with FIX is now added NOTE2: Added executable! Thank you nrapallo! Note: I decided to make my post #517 its on thread here in the SONY section I've found the TOC on ePUB generated by calibre to be intolerable. An ePUB with forty TOC entry can take up to 90sec. Below is what I've found TOC with "#HREF" syntax makes opening the ePUB extremely slow. With large enough TOC files this will take a long time or even cause the reader to crash. PROBLEM: I've noticed a big performance hit every time I try to open up an ePUB book and use the TOC. You mentioned on a different thread it was due to the #HERF. TEST: Okay I've done a few test to see how true this is and if there is a good solution to resolve this. Attached is 3 files Test File.epub (unmodified calibre generated TOC) Test File_NOREF.epub (ALL #HREF removed from all URL in the toc.ncx file) Test File_noREF_Capter.epub (Only the top level chapters have the #HREF removed, sub chapters have the #HREF) Measured time to the TOC from an ePUB book created from calibre.
SOLUTION There is a HUGE performance increase by just removing the the #HREF URL path from top level TOC. While there still is a hit on sub toc they are small and tolerable. To do this unzip the epub. Open the toc.ncx XML file. Go to the docTitle section Then move to the childe node titled docTitle/navPoint/content XPath <docTitle> <navPoint> <content src="URL"> Remove the #HREF portion located in the URL text of the content node. (i.e. at the end of the URL there is something "http://....#calibre_..." Remove everything from the hash (#) to the end of the URL. This only has to be done for the top level navPoints to increase the performance. Have Fun, =X= Last edited by =X=; 06-05-2009 at 12:47 PM. Reason: Added Note1 and script to fix the TOC |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,671
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
|
I started thinking this can easily be done with a perl script.
Is there any interested in such a script? =X= |
![]() |
![]() |
![]() |
#3 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,249
Karma: 145488788
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I'd be more interested in a Python script. But I would be interested in such a fix.
|
![]() |
![]() |
![]() |
#4 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,671
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
|
Hi Jon,
Unfortunately don't know how to program in Python. I can manage my way through existing code but writing code from scratch is a different story. I'm almost complete with the perl script, just polishing it up. It should be ready by tonight or tomorrow morning. =X= |
![]() |
![]() |
![]() |
#5 | |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,249
Karma: 145488788
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
![]() |
![]() |
![]() |
#6 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 714
Karma: 2003751
Join Date: Oct 2008
Location: Ottawa, ON
Device: Kobo Glo HD
|
ADE on Sony is really sensitive to the size of the "chunk", which is the reason that I always-always-always hand-edit the results of the conversion to have a single file per TOC entry. The smallest possible size of the xhtml chunk really helps.
|
![]() |
![]() |
![]() |
#7 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,671
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
|
Quote:
I used the Expat XML parser, one of the first XML parsers out there. It's easy to use but more importantly usually included in all Perl distributions. What I'm getting at is this does not support DOM so it's much harder to handle moving,copying nodes as a result. I'll look into it though it shouldn't be too hard. =X= |
|
![]() |
![]() |
![]() |
#8 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,671
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
|
First version of the ePUB TOC calibre enhancer
Okay here is my first pass
ePub_TOC_enhancer.pl eBookName.epub It also has a recursive mode just put the '-R' switch after the executable to update all ePubs in the current directory and all it's sub directories. Code:
usage: ePub_TOC_enhancer.pl eBookName.epub -h : this (help) message -R : Recursively search for ePUBs Example: ePub_TOC_enhancer.pl -R Will recursively search for ePUBs in the current directory and all its children. Example: ePub_TOC_enhancer.pl MYeBook.epub YOUReBook.epub OUReBOOK.epub Will only fix the eBooks specified. |
![]() |
![]() |
![]() |
#9 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,671
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
|
Quote:
What my first post is getting at is the HREF tag cause the SONY/ADE support to really get bad, more so than a single TOC entry for an HTML tag. Try the attached fix on a calibre generated ePUB and you will see the difference. =X= |
|
![]() |
![]() |
![]() |
#10 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 714
Karma: 2003751
Join Date: Oct 2008
Location: Ottawa, ON
Device: Kobo Glo HD
|
Quote:
Unlike what calibre is doing, if your book is split at TOC entries (one XHTML file per TOC entry), there is no need for #HREF tags in TOC entries. All of them are referenced as per your fix. ![]() |
|
![]() |
![]() |
![]() |
#11 |
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
=X=
As requested, Windows executable of your Perl script and batch file to reproduce same using the PAR-Packer-588 (v0.973) from CPAN. Now attached to post #1 above. Last edited by nrapallo; 06-05-2009 at 02:34 PM. Reason: attachment moved to post #1 |
![]() |
![]() |
![]() |
#12 |
The Introvert
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,307
Karma: 1000077497
Join Date: Jan 2007
Location: United Kingdom
Device: Sony Reader PRS-650 & 505 & 500
|
Guys, you are geniuses!
|
![]() |
![]() |
![]() |
#13 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,731
Karma: 3472866
Join Date: Apr 2008
Device: Sony PRS-650 & 350; Kindle Voyage; Kobo Aura HD, Aura One, and Forma
|
I am looking forward to trying this when I get home tonight!
Thanks!!! dordale ![]() |
![]() |
![]() |
![]() |
#14 |
01000100 01001010
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,889
Karma: 2400000
Join Date: Mar 2009
Device: Polyamorous
|
I've been trying but can't get the script to work. Will have to try when I don't have a migraine.
|
![]() |
![]() |
![]() |
#15 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,249
Karma: 145488788
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I don't recall the error, but 0.01 did not work on an ePub I tried it on that had no DRM.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Generated covers | melz | Calibre | 8 | 11-02-2024 04:34 AM |
Calibe, Dropbox & Two Computers? | modkindle | Related Tools | 2 | 10-10-2010 01:52 AM |
Does splitting EPUB among more HTML files improve Performance? | purcelljf | ePub | 2 | 10-01-2010 01:15 AM |
Calibe und speichern auf BeBook | ralphffm44 | Software | 11 | 12-09-2009 09:03 AM |
Calibe E-Book Conversion problem | =X= | Calibre | 2 | 05-24-2008 09:50 AM |