11-20-2011, 04:19 PM | #1 |
Junior Member
Posts: 7
Karma: 10
Join Date: Nov 2011
Device: none
|
10k files bible
I have a bible version which has ~8000 notes as single html files which are reffered by the other 2000 files (which contain the rest of the text), and of course thousands of links across all these 10k files ... Huge, I know... I am wondering what are my options to create a "simpler" epub file, not one with 10k files inside. After moving to a somehow better computer I managed to skip the "hunged" error and now I managed to create an epub file. It's only 12Mb, but inside the opf are 10k links and inside the zip 10k files. When I try to load the epub in FBReader (one epub reader for Android), it crashes. Even loading the epub in calibre viewer takes lot of time to load. I tried to: a) Convert the book to htmlz, but Calibre crashed with: File "site-packages\calibre\ebooks\htmlz\output.py", line 63, in convert MemoryError : Spoiler:
b) Convert the book to fb2 (hoping for a fb2.zip later), but Calibre crashed again with MemoryError : File "site-packages\calibre\ebooks\fb2\fb2ml.py", line 71, in clean_text File "re.py", line 151, in sub MemoryError" : Spoiler:
(computer is dual core, 3Gb RAM, WinXP) Can anyone suggest a better idea () on how to make a "working" (read efficient) epub or how to convert to an htmlz (which I want to covert aferwards again in epub, hoping for a much more performant epub - an epub with only few (but bigger) htmls inside). If I would create ~50-100 folders and try to spread the files (in a logical way) across them, would improve the epub open performance? This in the hope that FBReader (which is a very powerfull and tested epub reader) will be able to manage it. (Note: I have other bible epubs, but given that it's <100 files (does not have adnotations), works pretty well. Once again, it's not the size the problem, most probably the huge number of files. Thanks in advance for your suggestion(s). Last edited by aplicatii.ro; 11-20-2011 at 04:30 PM. Reason: Added possible idea... |
11-20-2011, 04:27 PM | #2 |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Try converting it to .mobi format first (using calibre or Mobipocket Creator) as that will combine all your source .html files into one file (used internally within the .mobi ebook) and THEN convert that .mobi ebook to .htmlz.
|
Advert | |
|
11-21-2011, 07:55 AM | #3 |
Junior Member
Posts: 7
Karma: 10
Join Date: Nov 2011
Device: none
|
thanks for tip, unfortunatelly not working
Thanks for the tip.
I've created mobi (from zip) and afterwards from mobi I've created epub. Unfortunatelly when I created the epub, it still created 10k files. I can't understand why, but it creates. The names of the files are "..._split...html" I did select the option "split if it's over 260Kb", but 99% of the files are <100Kb, actually <1k. If I would modify the book to have few hundred folders, would this speed up the epub loading? Or the issue is that the index files is too big (has too many entries)? |
11-21-2011, 04:42 PM | #4 | |
Junior Member
Posts: 7
Karma: 10
Join Date: Nov 2011
Device: none
|
Lacking ideas...
Quote:
|
|
11-21-2011, 09:33 PM | #5 |
Wizard
Posts: 1,759
Karma: 30063305
Join Date: Dec 2006
Location: Singapore
Device: Boyue
|
Disable the option to split on page breaks when converting to epub.
|
Advert | |
|
11-28-2011, 09:45 AM | #6 |
Junior Member
Posts: 7
Karma: 10
Join Date: Nov 2011
Device: none
|
OOM in different area now
Hi,
I run from mem. error to mem. error, in totally different cases. After many failures I have managed to import the html using the command line (calibredb add index.htm), but now, when I try to export epub (with splitting option on), mem. error. In the beginning there were too many files, I have worked to make only one 28 Mb (7 Mb Zip) file, but now it has hard time to split it. Maybe the algorithm is not well adjusted for big books splitting, or there are too many links/refferences, no clue what's going on... "MemoryError in split.py": Spoiler:
I would expect to have around 130 html pieces of ~240Kb each, but in the splittree there is an issue... PS: Machine 2core, 3Gb & lots of swap. |
11-29-2011, 12:44 AM | #7 |
Junior Member
Posts: 7
Karma: 10
Join Date: Nov 2011
Device: none
|
trying to convert to mobi instead of epub, no success
It seems that regardless what I try to do, I reach in out of memory. I have removed all css, cleaned up all I could from the html, still fails in completely different places with out of mem. E.g. now, after many hours of work, it crashed with: Converting XHTML to Mobipocket markup... File "site.py", line 132, in main File "site.py", line 109, in run_entry_point File "site-packages\calibre\utils\ipc\worker.py", line 187, in main File "site-packages\calibre\gui2\convert\gui_conversion.py", line 31, in gui_convert_override File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert File "site-packages\calibre\ebooks\conversion\plumber.py", line 1087, in run File "site-packages\calibre\ebooks\mobi\output.py", line 167, in convert File "site-packages\calibre\ebooks\mobi\mobiml.py", line 111, in __call__ File "site-packages\calibre\ebooks\mobi\mobiml.py", line 133, in mobimlize_spine File "site-packages\calibre\ebooks\oeb\stylizer.py", line 296, in __init__ File "site-packages\calibre\ebooks\oeb\stylizer.py", line 459, in style File "site-packages\calibre\ebooks\oeb\stylizer.py", line 488, in __init__ MemoryError Full error here: Spoiler:
Anyone has any suggestion? |
11-29-2011, 01:32 AM | #8 |
Kindler of the Flame
Posts: 582
Karma: 646016
Join Date: Oct 2009
Location: US of A
Device: K DX,3,KT,KP,KF, KFHD; Nook C, PRS600, iPad, Xoom, N900, N810, Zaurus
|
Can you tell us what Bible translation it is?
=== To create a source file, I typically combine files into one html/xml (copy *.html new.html) because it is easier for a human to work that way and to know for sure that I don't have garbage tags in there. Then I use emeditor with its robust regular expressions engine to clean the source and to chisel out an ebook with great formatting and navigation. Last edited by osnova; 11-29-2011 at 01:37 AM. |
11-30-2011, 03:56 AM | #9 |
Junior Member
Posts: 7
Karma: 10
Join Date: Nov 2011
Device: none
|
No Split requested, still I get split MemoryError
During my bible creation Saga, I have done the following:
1. Cleaned up as much as possible from the files (I use linux&perl's full power on regex plus this great website to test/learn regex: http://gskinner.com/RegExr/ ). This way I cleaned up: - all CSS I knew (style) - fonts, colors, JS, It's a simple html, nothing more. I don't think there is anything more I can clean (besides the text itself) 2. Merged all the files in one (this way I reached a ~20Mb html file) 3. Created the TOC at the beginning of the file (so the TOC can be created when I set the bf instead of depth first. 4. imported in calibre (with calibredb, as the gui crashes), and now it's a zip. Now I tried: 5a. To export it in epub (with split on) - > after few steps in the split process, it gives the MemoryError (see the log in my previous post) 5b. To export it in moby -> gives error (see the log in my previous post) 5c. To export it in epub without split (I've set the split above the size of the html, e.g. 30Mb), still it tries to split for some reason and I get again MemoryError on split (just at the beginning of the split)-> see log here: Spoiler:
I am completely out of ideas... I think I have found the book which is best suited for making calibre crash Here is the book (it's in romanian, but I think this doesn't matter, if you want to see how clean the html is...): a) Book before I import in calibre: HERE - But it will take few hours to import b) Book as it appears in the Calibre repository: HERE (This is the one I tried to export in various formats: epub, moby, epub without split). Note: There are some places where the characters are non-ascii (in around 20 words across the 20 Mb), but never caused any issue. If anyone can give some help/ideas on what I'm doing wrong or what else I should try, or review the html/zip above, please let me know. |
11-30-2011, 04:40 AM | #10 |
Wizard
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
I think if a document is to complex, or has too many cross-links (which I am guessing that this document might have) then I think that it may be beyond Calibre's ability to covert without running out of memory. Note that it is complexity that appears to cause the memory problem - not simpy document size. Calibre has been optimised to handle the vast majority of conversions that are of far simpler documents, but the consequnces are that it can fail on very complex ones. You may therefore simply be fighting a losing battle in trying to convert this particular book with Calibre.
As an aside you mentioned increasing the split value. If you do not keep it below 300KB then it is likely to fail on the vast majority of reading devices even if it did appear to successfully convert. |
11-30-2011, 12:05 PM | #11 | |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
I've done exactly what you have tried with calibre with my Webster's Dictionary 1913 and similarly reached memory limitations on most conversions.. |
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
AndroidGuys: Android Market Will Hit 10K Apps | kjk | Android Devices | 4 | 07-15-2010 10:18 PM |
iPad Oprah Gives O Magazine Staff $10K and a Kindle! | kjk | Apple Devices | 12 | 06-23-2010 01:57 PM |
Splitting the Bible into Multiple Files | SciFiGal777 | Ectaco jetBook | 3 | 03-27-2010 09:35 PM |
Black Mask 10k ebook DVD | Nate the great | Feedback | 6 | 08-07-2007 04:22 AM |
Olive Tree Bible Software Releases Ryrie Study Bible Notes for Palm OS and Pocket PC | Olive Tree | News | 1 | 03-05-2007 01:44 PM |