MobileRead Forums - View Single Post

During my bible creation Saga, I have done the following:

1. Cleaned up as much as possible from the files (I use linux&perl's full power on regex plus this great website to test/learn regex: http://gskinner.com/RegExr/ ). This way I cleaned up:
- all CSS I knew (style)
- fonts, colors, JS,
It's a simple html, nothing more. I don't think there is anything more I can clean (besides the text itself)
2. Merged all the files in one (this way I reached a ~20Mb html file)
3. Created the TOC at the beginning of the file (so the TOC can be created when I set the bf instead of depth first.
4. imported in calibre (with calibredb, as the gui crashes), and now it's a zip.

Now I tried:
5a. To export it in epub (with split on) - > after few steps in the split process, it gives the MemoryError (see the log in my previous post)
5b. To export it in moby -> gives error (see the log in my previous post)
5c. To export it in epub without split (I've set the split above the size of the html, e.g. 30Mb), still it tries to split for some reason and I get again MemoryError on split (just at the beginning of the split)-> see log here:

Spoiler:

I am completely out of ideas... I think I have found the book which is best suited for making calibre crash

Here is the book (it's in romanian, but I think this doesn't matter, if you want to see how clean the html is...):
a) Book before I import in calibre: HERE - But it will take few hours to import
b) Book as it appears in the Calibre repository: HERE (This is the one I tried to export in various formats: epub, moby, epub without split).

Note: There are some places where the characters are non-ascii (in around 20 words across the 20 Mb), but never caused any issue.

If anyone can give some help/ideas on what I'm doing wrong or what else I should try, or review the html/zip above, please let me know.

11-30-2011, 04:56 AM	#9
aplicatii.ro Junior Member Posts: 7 Karma: 10 Join Date: Nov 2011 Device: none	No Split requested, still I get split MemoryError During my bible creation Saga, I have done the following: 1. Cleaned up as much as possible from the files (I use linux&perl's full power on regex plus this great website to test/learn regex: http://gskinner.com/RegExr/ ). This way I cleaned up: - all CSS I knew (style) - fonts, colors, JS, It's a simple html, nothing more. I don't think there is anything more I can clean (besides the text itself) 2. Merged all the files in one (this way I reached a ~20Mb html file) 3. Created the TOC at the beginning of the file (so the TOC can be created when I set the bf instead of depth first. 4. imported in calibre (with calibredb, as the gui crashes), and now it's a zip. Now I tried: 5a. To export it in epub (with split on) - > after few steps in the split process, it gives the MemoryError (see the log in my previous post) 5b. To export it in moby -> gives error (see the log in my previous post) 5c. To export it in epub without split (I've set the split above the size of the html, e.g. 30Mb), still it tries to split for some reason and I get again MemoryError on split (just at the beginning of the split)-> see log here: Spoiler: calibre, version 0.8.27 ERROR: Conversion Error: <b>Failed</b>: Convert book 1 of 1 (Biblia Ortodoxa sau Sfânta Scriptură adnotata Bartolomeu Anania) Convert book 1 of 1 (Biblia Ortodoxa sau Sfânta Scriptură adnotata Bartolomeu Anania) Processing archive... Resolved conversion options calibre version: 0.8.27 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0.0, 'book_producer': None, 'breadth_first': False, 'change_justification': u'original', 'chapter': u'/', 'chapter_mark': u'none', 'comments': None, 'cover': None, 'debug_pipeline': None, 'dehyphenate': True, 'delete_blank_paragraphs': True, 'disable_font_rescaling': False, 'dont_package': False, 'dont_split_on_page_breaks': True, 'duplicate_links_in_toc': False, 'enable_heuristics': False, 'epub_flatten': False, 'extra_css': None, 'extract_to': None, 'filter_css': u'', 'fix_indents': True, 'flow_size': 30000, 'font_size_mapping': None, 'format_scene_breaks': True, 'html_unwrap_factor': 0.4, 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x03F631B0>, 'insert_blank_line': False, 'insert_blank_line_size': 0.5, 'insert_metadata': False, 'isbn': None, 'italicize_common_cases': True, 'keep_ligatures': False, 'language': None, 'level1_toc': u'//h:h1', 'level2_toc': u'//h:h2', 'level3_toc': u'//h:h3', 'line_height': 0.0, 'linearize_tables': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'markup_chapter_headings': True, 'max_levels': 5, 'max_toc_links': 100, 'minimum_line_height': 120.0, 'no_chapters_in_toc': False, 'no_default_epub_cover': True, 'no_inline_navbars': False, 'no_svg_cover': False, 'output_profile': <calibre.customize.profiles.GenericEink object at 0x03F633B0>, 'page_breaks_before': u'/', 'prefer_metadata_cover': False, 'preserve_cover_aspect_ratio': False, 'pretty_print': True, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': 'c:\\temp\\calibre_0.8.27_tmp_b5wlxt\\e3wgx7.opf', 'remove_fake_margins': True, 'remove_first_image': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'renumber_headings': True, 'replace_scene_breaks': u'', 'series': None, 'series_index': None, 'smarten_punctuation': False, 'sr1_replace': None, 'sr1_search': None, 'sr2_replace': None, 'sr2_search': None, 'sr3_replace': None, 'sr3_search': None, 'tags': None, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'unsmarten_punctuation': False, 'unwrap_lines': True, 'use_auto_toc': False, 'verbose': 2} InputFormatPlugin: HTML Input running on c:\temp\calibre_0.8.27_tmp_b5wlxt\zffyot_plumber_a rchive\content.opf Parsing all content... Manifest item 'toc.ncx' not found Parsing _allhtm.htm ... Parsing index.htm ... Generating default TOC from spine... Merging user specified metadata... Detecting structure... Auto generated TOC with 93 entries. Flattening CSS and remapping font sizes... Source base font size is 12.00000pt Removing fake margins... Parsing stylesheet.css ... Found 541 items of level: p_10 Found 103 items of level: p_11 Found 8 items of level: div_1 Found 4222 items of level: div_3 Found 27 items of level: div_7 Found 42505 items of level: div_6 Found 1 items of level: div_10 Found 20 items of level: p_8 Found 14 items of level: p_9 Found 65 items of level: p_6 Found 21 items of level: p_7 Found 1071 items of level: p_3 Found 1336 items of level: p_1 Ignoring level div_10 Ignoring level p_7 Ignoring level p_8 Ignoring level p_9 p_10 left margin stats: Counter({u'0': 541}) p_10 right margin stats: Counter({u'0': 541}) p_11 left margin stats: Counter({u'0': 103}) p_11 right margin stats: Counter({u'0': 103}) div_1 left margin stats: Counter() div_1 right margin stats: Counter() div_3 left margin stats: Counter({u'': 4222}) div_3 right margin stats: Counter({u'': 4222}) div_7 left margin stats: Counter({u'': 27}) div_7 right margin stats: Counter({u'': 27}) div_6 left margin stats: Counter({u'': 42505}) div_6 right margin stats: Counter({u'': 42505}) p_6 left margin stats: Counter({u'0': 65}) p_6 right margin stats: Counter({u'0': 65}) p_3 left margin stats: Counter({u'0': 1071}) p_3 right margin stats: Counter({u'0': 1071}) p_1 left margin stats: Counter({u'0': 1336}) p_1 right margin stats: Counter({u'0': 1336}) Cleaning up manifest... Trimming unused files from manifest... Creating EPUB Output... Rescaling image from 861x1159 to 558x751 06-palestina-vechiului-testament.jpg Rescaling image from 945x613 to 566x367 01-vechiul-orient.jpg Rescaling image from 1722x958 to 566x315 05-calatoria-captivitatii-apostolului-pavel.jpg Rescaling image from 1732x2376 to 547x751 03-ierusalimul-noului-testament.jpg Rescaling image from 1704x1278 to 566x425 04-calatoriile-misionare-ale-apostolului-pavel.jpg Rescaling image from 1749x2370 to 554x751 07-palestina-noului-testament.jpg Looking for large trees in _allhtm.htm... Found large tree #0 Splitting... Split point: {http://www.w3.org/1999/xhtml}div //[2]/*[720] Python function terminated unexpectedly (Error Code: 1) Traceback (most recent call last): File "site.py", line 132, in main File "site.py", line 109, in run_entry_point File "site-packages\calibre\utils\ipc\worker.py", line 187, in main File "site-packages\calibre\gui2\convert\gui_conversion.py", line 31, in gui_convert_override File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert File "site-packages\calibre\ebooks\conversion\plumber.py", line 1087, in run File "site-packages\calibre\ebooks\epub\output.py", line 169, in convert File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 57, in __call__ File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 67, in split_item File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 205, in __init__ File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 406, in split_to_size File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 27, in tostring File "lxml.etree.pyx", line 2860, in lxml.etree.tostring (src/lxml/lxml.etree.c:53681) File "serializer.pxi", line 139, in lxml.etree._tostring (src/lxml/lxml.etree.c:87439) MemoryError I am completely out of ideas... I think I have found the book which is best suited for making calibre crash Here is the book (it's in romanian, but I think this doesn't matter, if you want to see how clean the html is...): a) Book before I import in calibre: HERE - But it will take few hours to import b) Book as it appears in the Calibre repository: HERE (This is the one I tried to export in various formats: epub, moby, epub without split). Note: There are some places where the characters are non-ascii (in around 20 words across the 20 Mb), but never caused any issue. If anyone can give some help/ideas on what I'm doing wrong or what else I should try, or review the html/zip above, please let me know.