I have a bible version which has ~8000 notes as single html files which are reffered by the other 2000 files (which contain the rest of the text), and of course thousands of links across all these 10k files ... Huge, I know...
I am wondering what are my options to create a "simpler" epub file, not one with 10k files inside.
After moving to a somehow better computer I managed to skip the "hunged" error and now I managed to create an epub file. It's only 12Mb, but inside the opf are 10k links and inside the zip 10k files.
When I try to load the epub in FBReader (one epub reader for Android), it crashes. Even loading the epub in calibre viewer takes lot of time to load.
I tried to:
a) Convert the book to htmlz, but Calibre crashed with:
File "site-packages\calibre\ebooks\htmlz\output.py", line 63, in convert
MemoryError
:
Spoiler:
calibre, version 0.8.27
ERROR: Conversion Error: <b>Failed</b>: Convert book 1 of 1 (Biblia Ortodoxa Bartolomeu Anania)
Convert book 1 of 1 (Biblia Ortodoxa Bartolomeu Anania)
Processing archive...
Resolved conversion options
calibre version: 0.8.27
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0.0,
'book_producer': None,
'breadth_first': False,
'change_justification': u'original',
'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|section|part|prologue|epilogue\\s+', 'i')) or @class = 'chapter']",
'chapter_mark': u'pagebreak',
'comments': None,
'cover': None,
'debug_pipeline': None,
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_package': False,
'duplicate_links_in_toc': False,
'enable_heuristics': False,
'extra_css': None,
'filter_css': u'',
'fix_indents': True,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'htmlz_class_style': u'external',
'htmlz_css_type': u'class',
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x0403F0D0>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': u'//h:h1',
'level2_toc': u'//h:h2',
'level3_toc': u'//h:h3',
'line_height': 0.0,
'linearize_tables': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_levels': 5,
'max_toc_links': 50,
'minimum_line_height': 120.0,
'no_chapters_in_toc': False,
'no_inline_navbars': False,
'output_profile': <calibre.customize.profiles.GenericEink object at 0x0403F2D0>,
'page_breaks_before': u"//*[name()='h1' or name()='h2']",
'prefer_metadata_cover': False,
'pretty_print': False,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': 'c:\\temp\\calibre_0.8.27_tmp_tqfgsu\\ah023b.opf',
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': u'',
'series': None,
'series_index': None,
'smarten_punctuation': False,
'sr1_replace': None,
'sr1_search': None,
'sr2_replace': None,
'sr2_search': None,
'sr3_replace': None,
'sr3_search': None,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'unsmarten_punctuation': False,
'unwrap_lines': True,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: HTML Input running
on c:\temp\calibre_0.8.27_tmp_tqfgsu\qtlpes_plumber_a rchive\content.opf
Parsing all content...
Manifest item 'toc.ncx' not found
Parsing index-D.php-id%3dVT-Sof-02-05%26c%3d01.htm ...
Parsing index-D.php-id%3dVT-Sir-44-20%26c%3d01.htm ...
Parsing index-D.php-id%3dVT-Sir-44-21%26c%3d01.htm ...
[..........]
Parsing index-D.php-id%3dVT-Dn-06-01%26c%3d01.htm ...
Parsing index-D.php-id%3dVT-Dn-06-02%26c%3d01.htm ...
Referenced file 'index-C.php-id%3dXX-Co-02.htm' not found
Referenced file 'index-D.php-id%3dVT-Ps-106-25%26a%3dr01' not found
Referenced file 'index-C.php-id%3dNT-In-22.htm' not found
[...........]
Referenced file 'index-C.php-id%3dVT-Idt-24.htm' not found
Referenced file 'index-C.php-id%3dNT-Lc-1.htm' not found
Generating default TOC from spine...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 93 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Parsing stylesheet.css ...
Property: Invalid value for "CSS Level 2.1" property: lightyellow [3:1: background-color]
Found 1 items of level: div_8
Found 11999 items of level: div_1
Found 5 items of level: div_2
Found 27 items of level: div_5
Found 42576 items of level: div_4
Found 643 items of level: p_4
Found 108 items of level: p_5
Found 2400 items of level: p_2
Found 5 items of level: p_1
Ignoring level div_8
Ignoring level p_1
div_1 left margin stats: Counter()
div_1 right margin stats: Counter()
div_2 left margin stats: Counter()
div_2 right margin stats: Counter()
div_5 left margin stats: Counter({u'': 27})
div_5 right margin stats: Counter({u'': 27})
div_4 left margin stats: Counter({u'': 42576})
div_4 right margin stats: Counter({u'': 42576})
p_4 left margin stats: Counter({u'0': 643})
p_4 right margin stats: Counter({u'0': 643})
p_5 left margin stats: Counter({u'0': 108})
p_5 right margin stats: Counter({u'0': 108})
p_2 left margin stats: Counter({u'0': 2400})
p_2 right margin stats: Counter({u'0': 2400})
Cleaning up manifest...
Trimming unused files from manifest...
Creating HTMLZ Output...
Converting OEB book to HTML...
Converting index.htm to HTML...
Converting index-C.php-id%3dIN.htm to HTML...
Converting index-C.php-id%3dIN-PRE.htm to HTML...
Converting index-C.php-id%3dIN-CUVANT_LAMURITOR.htm to HTML...
Converting index-C.php-id%3dIN-INDREPTAR.htm to HTML...
Converting index-C.php-id%3dVT.htm to HTML...
Converting index-C.php-id%3dVT-Fc.htm to HTML...
Converting index-D.php-id%3dVT-Fc%26a%3dobs.htm to HTML...
Converting index-C.php-id%3dVT-Fc-01.htm to HTML...
Converting index-C.php-id%3dVT-Fc-02.htm to HTML...
Converting index-C.php-id%3dVT-Fc-03.htm to HTML...
[.............]
Converting index-C.php-id%3dAN-CONC-INVIERE%26obs%3dtrue.htm to HTML...
Converting index-C.php-id%3dAN-CONC-MARTURISIRE%26obs%3dtrue.htm to HTML...
Python function terminated unexpectedly
(Error Code: 1)
Traceback (most recent call last):
File "site.py", line 132, in main
File "site.py", line 109, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 187, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 31, in gui_convert_override
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 1087, in run
File "site-packages\calibre\ebooks\htmlz\output.py", line 63, in convert
MemoryError
b) Convert the book to fb2 (hoping for a fb2.zip later), but Calibre crashed again with MemoryError :
File "site-packages\calibre\ebooks\fb2\fb2ml.py", line 71, in clean_text
File "re.py", line 151, in sub
MemoryError"
:
Spoiler:
Converting index-C.php-id%3dAN-CONC-MARTURISIRE%26obs%3dtrue.htm to FictionBook2 XML
Python function terminated unexpectedly
(Error Code: 1)
Traceback (most recent call last):
File "site.py", line 132, in main
File "site.py", line 109, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 187, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 31, in gui_convert_override
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 1087, in run
File "site-packages\calibre\ebooks\fb2\output.py", line 175, in convert
File "site-packages\calibre\ebooks\fb2\fb2ml.py", line 55, in extract_content
File "site-packages\calibre\ebooks\fb2\fb2ml.py", line 62, in fb2mlize_spine
File "site-packages\calibre\ebooks\fb2\fb2ml.py", line 71, in clean_text
File "re.py", line 151, in sub
MemoryError
(computer is dual core, 3Gb RAM, WinXP)
Can anyone suggest a better idea (

) on how to make a "working" (read efficient) epub or how to convert to an htmlz (which I want to covert aferwards again in epub, hoping for a much more performant epub - an epub with only few (but bigger) htmls inside).
If I would create ~50-100 folders and try to spread the files (in a logical way) across them, would improve the epub open performance?
This in the hope that FBReader (which is a very powerfull and tested epub reader) will be able to manage it.
(Note: I have other bible epubs, but given that it's <100 files (does not have adnotations), works pretty well.
Once again, it's not the size the problem, most probably the huge number of files.
Thanks in advance for your suggestion(s).