Thread: 10k files bible
View Single Post
Old 11-28-2011, 09:45 AM   #6
aplicatii.ro
Junior Member
aplicatii.ro began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Nov 2011
Device: none
OOM in different area now

Hi,

I run from mem. error to mem. error, in totally different cases.

After many failures I have managed to import the html using the command line (calibredb add index.htm), but now, when I try to export epub (with splitting option on), mem. error.

In the beginning there were too many files, I have worked to make only one 28
Mb (7 Mb Zip) file, but now it has hard time to split it.
Maybe the algorithm is not well adjusted for big books splitting, or there are too many links/refferences, no clue what's going on...

"MemoryError in split.py":
Spoiler:
calibre, version 0.8.28
ERROR: Conversion Error: <b>Failed</b>: Convert book 1 of 1 (Biblia Ortodoxa adnotata Bartolomeu Anania)

Convert book 1 of 1 (Biblia Ortodoxa adnotata Bartolomeu Anania)
Processing archive...
Resolved conversion options
calibre version: 0.8.28
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0.0,
'book_producer': None,
'breadth_first': False,
'change_justification': u'original',
'chapter': u'/',
'chapter_mark': u'none',
'comments': None,
'cover': None,
'debug_pipeline': None,
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_package': False,
'dont_split_on_page_breaks': False,
'duplicate_links_in_toc': False,
'enable_heuristics': False,
'epub_flatten': False,
'extra_css': None,
'extract_to': None,
'filter_css': u'',
'fix_indents': True,
'flow_size': 260,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x04D54690>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': u'//h:h1',
'level2_toc': u'//h:h2',
'level3_toc': u'//h:h3',
'line_height': 0.0,
'linearize_tables': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_levels': 5,
'max_toc_links': 100,
'minimum_line_height': 120.0,
'no_chapters_in_toc': False,
'no_default_epub_cover': False,
'no_inline_navbars': False,
'no_svg_cover': False,
'output_profile': <calibre.customize.profiles.GenericEink object at 0x04D54890>,
'page_breaks_before': u'/',
'prefer_metadata_cover': False,
'preserve_cover_aspect_ratio': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': 'c:\\temp\\calibre_0.8.28_tmp_0ujpin\\_2p9bs.opf',
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': u'',
'series': None,
'series_index': None,
'smarten_punctuation': False,
'sr1_replace': None,
'sr1_search': None,
'sr2_replace': None,
'sr2_search': None,
'sr3_replace': None,
'sr3_search': None,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'unsmarten_punctuation': False,
'unwrap_lines': True,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: HTML Input running
on c:\temp\calibre_0.8.28_tmp_0ujpin\1nplup_plumber_a rchive\content.opf
Parsing all content...
Manifest item 'toc.ncx' not found
Parsing index.htm ...
Parsing _allhtm.htm ...
Generating default TOC from spine...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 93 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Parsing stylesheet.css ...
Found 541 items of level: p_10
Found 103 items of level: p_11
Found 8 items of level: div_1
Found 4222 items of level: div_3
Found 27 items of level: div_7
Found 42505 items of level: div_6
Found 1 items of level: div_10
Found 20 items of level: p_8
Found 14 items of level: p_9
Found 65 items of level: p_6
Found 21 items of level: p_7
Found 1071 items of level: p_3
Found 1336 items of level: p_1
Ignoring level div_10
Ignoring level p_7
Ignoring level p_8
Ignoring level p_9
p_10 left margin stats: Counter({u'0': 541})
p_10 right margin stats: Counter({u'0': 541})
p_11 left margin stats: Counter({u'0': 103})
p_11 right margin stats: Counter({u'0': 103})
div_1 left margin stats: Counter()
div_1 right margin stats: Counter()
div_3 left margin stats: Counter({u'': 4222})
div_3 right margin stats: Counter({u'': 4222})
div_7 left margin stats: Counter({u'': 27})
div_7 right margin stats: Counter({u'': 27})
div_6 left margin stats: Counter({u'': 42505})
div_6 right margin stats: Counter({u'': 42505})
p_6 left margin stats: Counter({u'0': 65})
p_6 right margin stats: Counter({u'0': 65})
p_3 left margin stats: Counter({u'0': 1071})
p_3 right margin stats: Counter({u'0': 1071})
p_1 left margin stats: Counter({u'0': 1336})
p_1 right margin stats: Counter({u'0': 1336})
Cleaning up manifest...
Trimming unused files from manifest...
Creating EPUB Output...
Rescaling image from 861x1159 to 558x751 06-palestina-vechiului-testament.jpg
Rescaling image from 1749x2370 to 554x751 07-palestina-noului-testament.jpg
Rescaling image from 945x613 to 566x367 01-vechiul-orient.jpg
Rescaling image from 1732x2376 to 547x751 03-ierusalimul-noului-testament.jpg
Rescaling image from 1704x1278 to 566x425 04-calatoriile-misionare-ale-apostolului-pavel.jpg
Rescaling image from 1722x958 to 566x315 05-calatoria-captivitatii-apostolului-pavel.jpg
Looking for large trees in index.htm...
No large trees found
Looking for large trees in _allhtm.htm...
Found large tree #0
Splitting...
Split point: {http://www.w3.org/1999/xhtml}div /*/*[2]/*[720]
Split tree still too large: 622 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}div /*/*[2]/*[717]
Split tree too small
Splitting...
Split point: {http://www.w3.org/1999/xhtml}div /*/*[2]/*[718]
Split tree too small
Splitting...
Split point: {http://www.w3.org/1999/xhtml}div /*/*[2]/*[683]
Split tree still too large: 573 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}div /*/*[2]/*[682]
Split tree too small
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[369]
Split tree still too large: 319 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[205]
Committed sub-tree #1 (170 KB)
Committed sub-tree #2 (149 KB)
Committed sub-tree #3 (253 KB)
Committed sub-tree #4 (49 KB)
Split tree still too large: 29567 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}div /*/*[2]/*[483]
Split tree still too large: 284 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}div /*/*[2]/*[482]
Split tree too small
Splitting...
Split point: {http://www.w3.org/1999/xhtml}div /*/*[2]/*[481]
Split tree too small
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[242]
Committed sub-tree #5 (164 KB)
Committed sub-tree #6 (120 KB)
Python function terminated unexpectedly
(Error Code: 1)
Traceback (most recent call last):
File "site.py", line 132, in main
File "site.py", line 109, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 187, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 31, in gui_convert_override
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 1087, in run
File "site-packages\calibre\ebooks\epub\output.py", line 169, in convert
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 57, in __call__
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 67, in split_item
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 205, in __init__
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 425, in split_to_size
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 414, in split_to_size
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 350, in is_page_empty
File "lxml.etree.pyx", line 2860, in lxml.etree.tostring (src/lxml/lxml.etree.c:53681)
File "serializer.pxi", line 95, in lxml.etree._tostring (src/lxml/lxml.etree.c:87055)
File "serializer.pxi", line 63, in lxml.etree._textToString (src/lxml/lxml.etree.c:86837)
MemoryError


I would expect to have around 130 html pieces of ~240Kb each, but in the splittree there is an issue...

PS: Machine 2core, 3Gb & lots of swap.
aplicatii.ro is offline   Reply With Quote