Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 11-20-2011, 04:19 PM   #1
aplicatii.ro
Junior Member
aplicatii.ro began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Nov 2011
Device: none
Lightbulb 10k files bible



I have a bible version which has ~8000 notes as single html files which are reffered by the other 2000 files (which contain the rest of the text), and of course thousands of links across all these 10k files ... Huge, I know...

I am wondering what are my options to create a "simpler" epub file, not one with 10k files inside.

After moving to a somehow better computer I managed to skip the "hunged" error and now I managed to create an epub file. It's only 12Mb, but inside the opf are 10k links and inside the zip 10k files.

When I try to load the epub in FBReader (one epub reader for Android), it crashes. Even loading the epub in calibre viewer takes lot of time to load.

I tried to:
a) Convert the book to htmlz, but Calibre crashed with:
File "site-packages\calibre\ebooks\htmlz\output.py", line 63, in convert
MemoryError
:
Spoiler:
calibre, version 0.8.27
ERROR: Conversion Error: <b>Failed</b>: Convert book 1 of 1 (Biblia Ortodoxa Bartolomeu Anania)

Convert book 1 of 1 (Biblia Ortodoxa Bartolomeu Anania)
Processing archive...
Resolved conversion options
calibre version: 0.8.27
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0.0,
'book_producer': None,
'breadth_first': False,
'change_justification': u'original',
'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|section|part|prologue|epilogue\\s+', 'i')) or @class = 'chapter']",
'chapter_mark': u'pagebreak',
'comments': None,
'cover': None,
'debug_pipeline': None,
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_package': False,
'duplicate_links_in_toc': False,
'enable_heuristics': False,
'extra_css': None,
'filter_css': u'',
'fix_indents': True,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'htmlz_class_style': u'external',
'htmlz_css_type': u'class',
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x0403F0D0>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': u'//h:h1',
'level2_toc': u'//h:h2',
'level3_toc': u'//h:h3',
'line_height': 0.0,
'linearize_tables': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_levels': 5,
'max_toc_links': 50,
'minimum_line_height': 120.0,
'no_chapters_in_toc': False,
'no_inline_navbars': False,
'output_profile': <calibre.customize.profiles.GenericEink object at 0x0403F2D0>,
'page_breaks_before': u"//*[name()='h1' or name()='h2']",
'prefer_metadata_cover': False,
'pretty_print': False,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': 'c:\\temp\\calibre_0.8.27_tmp_tqfgsu\\ah023b.opf',
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': u'',
'series': None,
'series_index': None,
'smarten_punctuation': False,
'sr1_replace': None,
'sr1_search': None,
'sr2_replace': None,
'sr2_search': None,
'sr3_replace': None,
'sr3_search': None,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'unsmarten_punctuation': False,
'unwrap_lines': True,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: HTML Input running
on c:\temp\calibre_0.8.27_tmp_tqfgsu\qtlpes_plumber_a rchive\content.opf
Parsing all content...
Manifest item 'toc.ncx' not found
Parsing index-D.php-id%3dVT-Sof-02-05%26c%3d01.htm ...
Parsing index-D.php-id%3dVT-Sir-44-20%26c%3d01.htm ...
Parsing index-D.php-id%3dVT-Sir-44-21%26c%3d01.htm ...
[..........]
Parsing index-D.php-id%3dVT-Dn-06-01%26c%3d01.htm ...
Parsing index-D.php-id%3dVT-Dn-06-02%26c%3d01.htm ...
Referenced file 'index-C.php-id%3dXX-Co-02.htm' not found
Referenced file 'index-D.php-id%3dVT-Ps-106-25%26a%3dr01' not found
Referenced file 'index-C.php-id%3dNT-In-22.htm' not found
[...........]
Referenced file 'index-C.php-id%3dVT-Idt-24.htm' not found
Referenced file 'index-C.php-id%3dNT-Lc-1.htm' not found
Generating default TOC from spine...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 93 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Parsing stylesheet.css ...
Property: Invalid value for "CSS Level 2.1" property: lightyellow [3:1: background-color]
Found 1 items of level: div_8
Found 11999 items of level: div_1
Found 5 items of level: div_2
Found 27 items of level: div_5
Found 42576 items of level: div_4
Found 643 items of level: p_4
Found 108 items of level: p_5
Found 2400 items of level: p_2
Found 5 items of level: p_1
Ignoring level div_8
Ignoring level p_1
div_1 left margin stats: Counter()
div_1 right margin stats: Counter()
div_2 left margin stats: Counter()
div_2 right margin stats: Counter()
div_5 left margin stats: Counter({u'': 27})
div_5 right margin stats: Counter({u'': 27})
div_4 left margin stats: Counter({u'': 42576})
div_4 right margin stats: Counter({u'': 42576})
p_4 left margin stats: Counter({u'0': 643})
p_4 right margin stats: Counter({u'0': 643})
p_5 left margin stats: Counter({u'0': 108})
p_5 right margin stats: Counter({u'0': 108})
p_2 left margin stats: Counter({u'0': 2400})
p_2 right margin stats: Counter({u'0': 2400})
Cleaning up manifest...
Trimming unused files from manifest...
Creating HTMLZ Output...
Converting OEB book to HTML...
Converting index.htm to HTML...
Converting index-C.php-id%3dIN.htm to HTML...
Converting index-C.php-id%3dIN-PRE.htm to HTML...
Converting index-C.php-id%3dIN-CUVANT_LAMURITOR.htm to HTML...
Converting index-C.php-id%3dIN-INDREPTAR.htm to HTML...
Converting index-C.php-id%3dVT.htm to HTML...
Converting index-C.php-id%3dVT-Fc.htm to HTML...
Converting index-D.php-id%3dVT-Fc%26a%3dobs.htm to HTML...
Converting index-C.php-id%3dVT-Fc-01.htm to HTML...
Converting index-C.php-id%3dVT-Fc-02.htm to HTML...
Converting index-C.php-id%3dVT-Fc-03.htm to HTML...
[.............]
Converting index-C.php-id%3dAN-CONC-INVIERE%26obs%3dtrue.htm to HTML...
Converting index-C.php-id%3dAN-CONC-MARTURISIRE%26obs%3dtrue.htm to HTML...
Python function terminated unexpectedly
(Error Code: 1)
Traceback (most recent call last):
File "site.py", line 132, in main
File "site.py", line 109, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 187, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 31, in gui_convert_override
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 1087, in run
File "site-packages\calibre\ebooks\htmlz\output.py", line 63, in convert
MemoryError


b) Convert the book to fb2 (hoping for a fb2.zip later), but Calibre crashed again with MemoryError :
File "site-packages\calibre\ebooks\fb2\fb2ml.py", line 71, in clean_text
File "re.py", line 151, in sub
MemoryError"
:
Spoiler:

Converting index-C.php-id%3dAN-CONC-MARTURISIRE%26obs%3dtrue.htm to FictionBook2 XML
Python function terminated unexpectedly
(Error Code: 1)
Traceback (most recent call last):
File "site.py", line 132, in main
File "site.py", line 109, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 187, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 31, in gui_convert_override
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 1087, in run
File "site-packages\calibre\ebooks\fb2\output.py", line 175, in convert
File "site-packages\calibre\ebooks\fb2\fb2ml.py", line 55, in extract_content
File "site-packages\calibre\ebooks\fb2\fb2ml.py", line 62, in fb2mlize_spine
File "site-packages\calibre\ebooks\fb2\fb2ml.py", line 71, in clean_text
File "re.py", line 151, in sub
MemoryError

(computer is dual core, 3Gb RAM, WinXP)
Can anyone suggest a better idea () on how to make a "working" (read efficient) epub or how to convert to an htmlz (which I want to covert aferwards again in epub, hoping for a much more performant epub - an epub with only few (but bigger) htmls inside).

If I would create ~50-100 folders and try to spread the files (in a logical way) across them, would improve the epub open performance?

This in the hope that FBReader (which is a very powerfull and tested epub reader) will be able to manage it.
(Note: I have other bible epubs, but given that it's <100 files (does not have adnotations), works pretty well.

Once again, it's not the size the problem, most probably the huge number of files.

Thanks in advance for your suggestion(s).

Last edited by aplicatii.ro; 11-20-2011 at 04:30 PM. Reason: Added possible idea...
aplicatii.ro is offline   Reply With Quote
Old 11-20-2011, 04:27 PM   #2
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Try converting it to .mobi format first (using calibre or Mobipocket Creator) as that will combine all your source .html files into one file (used internally within the .mobi ebook) and THEN convert that .mobi ebook to .htmlz.
nrapallo is offline   Reply With Quote
Old 11-21-2011, 07:55 AM   #3
aplicatii.ro
Junior Member
aplicatii.ro began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Nov 2011
Device: none
Unhappy thanks for tip, unfortunatelly not working

Thanks for the tip.
I've created mobi (from zip) and afterwards from mobi I've created epub.

Unfortunatelly when I created the epub, it still created 10k files.
I can't understand why, but it creates. The names of the files are "..._split...html"
I did select the option "split if it's over 260Kb", but 99% of the files are <100Kb, actually <1k.

If I would modify the book to have few hundred folders, would this speed up the epub loading?
Or the issue is that the index files is too big (has too many entries)?
aplicatii.ro is offline   Reply With Quote
Old 11-21-2011, 04:42 PM   #4
aplicatii.ro
Junior Member
aplicatii.ro began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Nov 2011
Device: none
Lacking ideas...

Quote:
Originally Posted by aplicatii.ro View Post
Thanks for the tip.
I've created mobi (from zip) and afterwards from mobi I've created epub.

Unfortunatelly when I created the epub, it still created 10k files.
I can't understand why, but it creates. The names of the files are "..._split...html"
I did select the option "split if it's over 260Kb", but 99% of the files are <100Kb, actually <1k.

If I would modify the book to have few hundred folders, would this speed up the epub loading?
Or the issue is that the index files is too big (has too many entries)?
After reading more, it seems that FBReaderJ does not work properly with subfolders, and actually needs all the files in root, so my proposed solution is not valid...
aplicatii.ro is offline   Reply With Quote
Old 11-21-2011, 09:33 PM   #5
Dopedangel
Wizard
Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.
 
Dopedangel's Avatar
 
Posts: 1,798
Karma: 30548723
Join Date: Dec 2006
Location: Singapore
Device: Boyue
Disable the option to split on page breaks when converting to epub.
Dopedangel is offline   Reply With Quote
Old 11-28-2011, 09:45 AM   #6
aplicatii.ro
Junior Member
aplicatii.ro began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Nov 2011
Device: none
OOM in different area now

Hi,

I run from mem. error to mem. error, in totally different cases.

After many failures I have managed to import the html using the command line (calibredb add index.htm), but now, when I try to export epub (with splitting option on), mem. error.

In the beginning there were too many files, I have worked to make only one 28
Mb (7 Mb Zip) file, but now it has hard time to split it.
Maybe the algorithm is not well adjusted for big books splitting, or there are too many links/refferences, no clue what's going on...

"MemoryError in split.py":
Spoiler:
calibre, version 0.8.28
ERROR: Conversion Error: <b>Failed</b>: Convert book 1 of 1 (Biblia Ortodoxa adnotata Bartolomeu Anania)

Convert book 1 of 1 (Biblia Ortodoxa adnotata Bartolomeu Anania)
Processing archive...
Resolved conversion options
calibre version: 0.8.28
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0.0,
'book_producer': None,
'breadth_first': False,
'change_justification': u'original',
'chapter': u'/',
'chapter_mark': u'none',
'comments': None,
'cover': None,
'debug_pipeline': None,
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_package': False,
'dont_split_on_page_breaks': False,
'duplicate_links_in_toc': False,
'enable_heuristics': False,
'epub_flatten': False,
'extra_css': None,
'extract_to': None,
'filter_css': u'',
'fix_indents': True,
'flow_size': 260,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x04D54690>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': u'//h:h1',
'level2_toc': u'//h:h2',
'level3_toc': u'//h:h3',
'line_height': 0.0,
'linearize_tables': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_levels': 5,
'max_toc_links': 100,
'minimum_line_height': 120.0,
'no_chapters_in_toc': False,
'no_default_epub_cover': False,
'no_inline_navbars': False,
'no_svg_cover': False,
'output_profile': <calibre.customize.profiles.GenericEink object at 0x04D54890>,
'page_breaks_before': u'/',
'prefer_metadata_cover': False,
'preserve_cover_aspect_ratio': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': 'c:\\temp\\calibre_0.8.28_tmp_0ujpin\\_2p9bs.opf',
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': u'',
'series': None,
'series_index': None,
'smarten_punctuation': False,
'sr1_replace': None,
'sr1_search': None,
'sr2_replace': None,
'sr2_search': None,
'sr3_replace': None,
'sr3_search': None,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'unsmarten_punctuation': False,
'unwrap_lines': True,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: HTML Input running
on c:\temp\calibre_0.8.28_tmp_0ujpin\1nplup_plumber_a rchive\content.opf
Parsing all content...
Manifest item 'toc.ncx' not found
Parsing index.htm ...
Parsing _allhtm.htm ...
Generating default TOC from spine...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 93 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Parsing stylesheet.css ...
Found 541 items of level: p_10
Found 103 items of level: p_11
Found 8 items of level: div_1
Found 4222 items of level: div_3
Found 27 items of level: div_7
Found 42505 items of level: div_6
Found 1 items of level: div_10
Found 20 items of level: p_8
Found 14 items of level: p_9
Found 65 items of level: p_6
Found 21 items of level: p_7
Found 1071 items of level: p_3
Found 1336 items of level: p_1
Ignoring level div_10
Ignoring level p_7
Ignoring level p_8
Ignoring level p_9
p_10 left margin stats: Counter({u'0': 541})
p_10 right margin stats: Counter({u'0': 541})
p_11 left margin stats: Counter({u'0': 103})
p_11 right margin stats: Counter({u'0': 103})
div_1 left margin stats: Counter()
div_1 right margin stats: Counter()
div_3 left margin stats: Counter({u'': 4222})
div_3 right margin stats: Counter({u'': 4222})
div_7 left margin stats: Counter({u'': 27})
div_7 right margin stats: Counter({u'': 27})
div_6 left margin stats: Counter({u'': 42505})
div_6 right margin stats: Counter({u'': 42505})
p_6 left margin stats: Counter({u'0': 65})
p_6 right margin stats: Counter({u'0': 65})
p_3 left margin stats: Counter({u'0': 1071})
p_3 right margin stats: Counter({u'0': 1071})
p_1 left margin stats: Counter({u'0': 1336})
p_1 right margin stats: Counter({u'0': 1336})
Cleaning up manifest...
Trimming unused files from manifest...
Creating EPUB Output...
Rescaling image from 861x1159 to 558x751 06-palestina-vechiului-testament.jpg
Rescaling image from 1749x2370 to 554x751 07-palestina-noului-testament.jpg
Rescaling image from 945x613 to 566x367 01-vechiul-orient.jpg
Rescaling image from 1732x2376 to 547x751 03-ierusalimul-noului-testament.jpg
Rescaling image from 1704x1278 to 566x425 04-calatoriile-misionare-ale-apostolului-pavel.jpg
Rescaling image from 1722x958 to 566x315 05-calatoria-captivitatii-apostolului-pavel.jpg
Looking for large trees in index.htm...
No large trees found
Looking for large trees in _allhtm.htm...
Found large tree #0
Splitting...
Split point: {http://www.w3.org/1999/xhtml}div /*/*[2]/*[720]
Split tree still too large: 622 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}div /*/*[2]/*[717]
Split tree too small
Splitting...
Split point: {http://www.w3.org/1999/xhtml}div /*/*[2]/*[718]
Split tree too small
Splitting...
Split point: {http://www.w3.org/1999/xhtml}div /*/*[2]/*[683]
Split tree still too large: 573 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}div /*/*[2]/*[682]
Split tree too small
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[369]
Split tree still too large: 319 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[205]
Committed sub-tree #1 (170 KB)
Committed sub-tree #2 (149 KB)
Committed sub-tree #3 (253 KB)
Committed sub-tree #4 (49 KB)
Split tree still too large: 29567 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}div /*/*[2]/*[483]
Split tree still too large: 284 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}div /*/*[2]/*[482]
Split tree too small
Splitting...
Split point: {http://www.w3.org/1999/xhtml}div /*/*[2]/*[481]
Split tree too small
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[242]
Committed sub-tree #5 (164 KB)
Committed sub-tree #6 (120 KB)
Python function terminated unexpectedly
(Error Code: 1)
Traceback (most recent call last):
File "site.py", line 132, in main
File "site.py", line 109, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 187, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 31, in gui_convert_override
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 1087, in run
File "site-packages\calibre\ebooks\epub\output.py", line 169, in convert
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 57, in __call__
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 67, in split_item
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 205, in __init__
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 425, in split_to_size
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 414, in split_to_size
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 350, in is_page_empty
File "lxml.etree.pyx", line 2860, in lxml.etree.tostring (src/lxml/lxml.etree.c:53681)
File "serializer.pxi", line 95, in lxml.etree._tostring (src/lxml/lxml.etree.c:87055)
File "serializer.pxi", line 63, in lxml.etree._textToString (src/lxml/lxml.etree.c:86837)
MemoryError


I would expect to have around 130 html pieces of ~240Kb each, but in the splittree there is an issue...

PS: Machine 2core, 3Gb & lots of swap.
aplicatii.ro is offline   Reply With Quote
Old 11-29-2011, 12:44 AM   #7
aplicatii.ro
Junior Member
aplicatii.ro began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Nov 2011
Device: none
trying to convert to mobi instead of epub, no success



It seems that regardless what I try to do, I reach in out of memory.
I have removed all css, cleaned up all I could from the html, still fails in completely different places with out of mem.

E.g. now, after many hours of work, it crashed with:
Converting XHTML to Mobipocket markup...
File "site.py", line 132, in main
File "site.py", line 109, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 187, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 31, in gui_convert_override
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 1087, in run
File "site-packages\calibre\ebooks\mobi\output.py", line 167, in convert
File "site-packages\calibre\ebooks\mobi\mobiml.py", line 111, in __call__
File "site-packages\calibre\ebooks\mobi\mobiml.py", line 133, in mobimlize_spine
File "site-packages\calibre\ebooks\oeb\stylizer.py", line 296, in __init__
File "site-packages\calibre\ebooks\oeb\stylizer.py", line 459, in style
File "site-packages\calibre\ebooks\oeb\stylizer.py", line 488, in __init__
MemoryError

Full error here:

Spoiler:
calibre, version 0.8.28
ERROR: Conversion Error: <b>Failed</b>: Convert book 1 of 1 (Biblia Ortodoxa adnotata Bartolomeu Anania)

Convert book 1 of 1 (Biblia Ortodoxa adnotata Bartolomeu Anania)
Processing archive...
Resolved conversion options
calibre version: 0.8.28
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0.0,
'book_producer': None,
'breadth_first': False,
'change_justification': u'original',
'chapter': u'/',
'chapter_mark': u'none',
'comments': None,
'cover': None,
'debug_pipeline': None,
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_compress': False,
'dont_package': False,
'duplicate_links_in_toc': False,
'enable_heuristics': False,
'extra_css': None,
'extract_to': None,
'filter_css': u'',
'fix_indents': True,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x04D4A690>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': u'//h:h1',
'level2_toc': u'//h:h2',
'level3_toc': u'//h:h3',
'line_height': 0.0,
'linearize_tables': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_levels': 5,
'max_toc_links': 100,
'minimum_line_height': 120.0,
'mobi_ignore_margins': False,
'mobi_toc_at_start': False,
'no_chapters_in_toc': False,
'no_inline_navbars': False,
'no_inline_toc': False,
'output_profile': <calibre.customize.profiles.GenericEink object at 0x04D4A890>,
'page_breaks_before': u'/',
'personal_doc': u'[PDOC]',
'prefer_author_sort': False,
'prefer_metadata_cover': False,
'pretty_print': False,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': 'c:\\temp\\calibre_0.8.28_tmp_0ujpin\\zs9jyu.opf',
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': u'',
'rescale_images': False,
'series': None,
'series_index': None,
'share_not_sync': False,
'smarten_punctuation': False,
'sr1_replace': None,
'sr1_search': None,
'sr2_replace': None,
'sr2_search': None,
'sr3_replace': None,
'sr3_search': None,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'toc_title': None,
'unsmarten_punctuation': False,
'unwrap_lines': True,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: HTML Input running
on c:\temp\calibre_0.8.28_tmp_0ujpin\_0s0gc_plumber_a rchive\content.opf
Parsing all content...
Manifest item 'toc.ncx' not found
Parsing index.htm ...
Parsing _allhtm.htm ...
Generating default TOC from spine...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 93 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Parsing stylesheet.css ...
Found 541 items of level: p_10
Found 103 items of level: p_11
Found 8 items of level: div_1
Found 4222 items of level: div_3
Found 27 items of level: div_7
Found 42505 items of level: div_6
Found 1 items of level: div_10
Found 20 items of level: p_8
Found 14 items of level: p_9
Found 65 items of level: p_6
Found 21 items of level: p_7
Found 1071 items of level: p_3
Found 1336 items of level: p_1
Ignoring level div_10
Ignoring level p_7
Ignoring level p_8
Ignoring level p_9
p_10 left margin stats: Counter({u'0': 541})
p_10 right margin stats: Counter({u'0': 541})
p_11 left margin stats: Counter({u'0': 103})
p_11 right margin stats: Counter({u'0': 103})
div_1 left margin stats: Counter()
div_1 right margin stats: Counter()
div_3 left margin stats: Counter({u'': 4222})
div_3 right margin stats: Counter({u'': 4222})
div_7 left margin stats: Counter({u'': 27})
div_7 right margin stats: Counter({u'': 27})
div_6 left margin stats: Counter({u'': 42505})
div_6 right margin stats: Counter({u'': 42505})
p_6 left margin stats: Counter({u'0': 65})
p_6 right margin stats: Counter({u'0': 65})
p_3 left margin stats: Counter({u'0': 1071})
p_3 right margin stats: Counter({u'0': 1071})
p_1 left margin stats: Counter({u'0': 1336})
p_1 right margin stats: Counter({u'0': 1336})
Cleaning up manifest...
Trimming unused files from manifest...
Creating MOBI Output...
Generating in-line TOC...
Applying case-transforming CSS...
Parsing manglecase.css ...
Parsing tocstyle.css ...
Rasterizing SVG images...
Converting XHTML to Mobipocket markup...
Python function terminated unexpectedly
(Error Code: 1)
Traceback (most recent call last):
File "site.py", line 132, in main
File "site.py", line 109, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 187, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 31, in gui_convert_override
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 1087, in run
File "site-packages\calibre\ebooks\mobi\output.py", line 167, in convert
File "site-packages\calibre\ebooks\mobi\mobiml.py", line 111, in __call__
File "site-packages\calibre\ebooks\mobi\mobiml.py", line 133, in mobimlize_spine
File "site-packages\calibre\ebooks\oeb\stylizer.py", line 296, in __init__
File "site-packages\calibre\ebooks\oeb\stylizer.py", line 459, in style
File "site-packages\calibre\ebooks\oeb\stylizer.py", line 488, in __init__
MemoryError


Anyone has any suggestion?
aplicatii.ro is offline   Reply With Quote
Old 11-29-2011, 01:32 AM   #8
osnova
Kindler of the Flame
osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.
 
osnova's Avatar
 
Posts: 582
Karma: 646016
Join Date: Oct 2009
Location: US of A
Device: K DX,3,KT,KP,KF, KFHD; Nook C, PRS600, iPad, Xoom, N900, N810, Zaurus
Can you tell us what Bible translation it is?

===
To create a source file, I typically combine files into one html/xml (copy *.html new.html) because it is easier for a human to work that way and to know for sure that I don't have garbage tags in there. Then I use emeditor with its robust regular expressions engine to clean the source and to chisel out an ebook with great formatting and navigation.

Last edited by osnova; 11-29-2011 at 01:37 AM.
osnova is offline   Reply With Quote
Old 11-30-2011, 03:56 AM   #9
aplicatii.ro
Junior Member
aplicatii.ro began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Nov 2011
Device: none
No Split requested, still I get split MemoryError

During my bible creation Saga, I have done the following:

1. Cleaned up as much as possible from the files (I use linux&perl's full power on regex plus this great website to test/learn regex: http://gskinner.com/RegExr/ ). This way I cleaned up:
- all CSS I knew (style)
- fonts, colors, JS,
It's a simple html, nothing more. I don't think there is anything more I can clean (besides the text itself)
2. Merged all the files in one (this way I reached a ~20Mb html file)
3. Created the TOC at the beginning of the file (so the TOC can be created when I set the bf instead of depth first.
4. imported in calibre (with calibredb, as the gui crashes), and now it's a zip.

Now I tried:
5a. To export it in epub (with split on) - > after few steps in the split process, it gives the MemoryError (see the log in my previous post)
5b. To export it in moby -> gives error (see the log in my previous post)
5c. To export it in epub without split (I've set the split above the size of the html, e.g. 30Mb), still it tries to split for some reason and I get again MemoryError on split (just at the beginning of the split)-> see log here:
Spoiler:
calibre, version 0.8.27
ERROR: Conversion Error: <b>Failed</b>: Convert book 1 of 1 (Biblia Ortodoxa sau Sfânta Scriptură adnotata Bartolomeu Anania)

Convert book 1 of 1 (Biblia Ortodoxa sau Sfânta Scriptură adnotata Bartolomeu Anania)
Processing archive...
Resolved conversion options
calibre version: 0.8.27
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0.0,
'book_producer': None,
'breadth_first': False,
'change_justification': u'original',
'chapter': u'/',
'chapter_mark': u'none',
'comments': None,
'cover': None,
'debug_pipeline': None,
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_package': False,
'dont_split_on_page_breaks': True,
'duplicate_links_in_toc': False,
'enable_heuristics': False,
'epub_flatten': False,
'extra_css': None,
'extract_to': None,
'filter_css': u'',
'fix_indents': True,
'flow_size': 30000,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x03F631B0>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': u'//h:h1',
'level2_toc': u'//h:h2',
'level3_toc': u'//h:h3',
'line_height': 0.0,
'linearize_tables': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_levels': 5,
'max_toc_links': 100,
'minimum_line_height': 120.0,
'no_chapters_in_toc': False,
'no_default_epub_cover': True,
'no_inline_navbars': False,
'no_svg_cover': False,
'output_profile': <calibre.customize.profiles.GenericEink object at 0x03F633B0>,
'page_breaks_before': u'/',
'prefer_metadata_cover': False,
'preserve_cover_aspect_ratio': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': 'c:\\temp\\calibre_0.8.27_tmp_b5wlxt\\e3wgx7.opf',
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': u'',
'series': None,
'series_index': None,
'smarten_punctuation': False,
'sr1_replace': None,
'sr1_search': None,
'sr2_replace': None,
'sr2_search': None,
'sr3_replace': None,
'sr3_search': None,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'unsmarten_punctuation': False,
'unwrap_lines': True,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: HTML Input running
on c:\temp\calibre_0.8.27_tmp_b5wlxt\zffyot_plumber_a rchive\content.opf
Parsing all content...
Manifest item 'toc.ncx' not found
Parsing _allhtm.htm ...
Parsing index.htm ...
Generating default TOC from spine...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 93 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Parsing stylesheet.css ...
Found 541 items of level: p_10
Found 103 items of level: p_11
Found 8 items of level: div_1
Found 4222 items of level: div_3
Found 27 items of level: div_7
Found 42505 items of level: div_6
Found 1 items of level: div_10
Found 20 items of level: p_8
Found 14 items of level: p_9
Found 65 items of level: p_6
Found 21 items of level: p_7
Found 1071 items of level: p_3
Found 1336 items of level: p_1
Ignoring level div_10
Ignoring level p_7
Ignoring level p_8
Ignoring level p_9
p_10 left margin stats: Counter({u'0': 541})
p_10 right margin stats: Counter({u'0': 541})
p_11 left margin stats: Counter({u'0': 103})
p_11 right margin stats: Counter({u'0': 103})
div_1 left margin stats: Counter()
div_1 right margin stats: Counter()
div_3 left margin stats: Counter({u'': 4222})
div_3 right margin stats: Counter({u'': 4222})
div_7 left margin stats: Counter({u'': 27})
div_7 right margin stats: Counter({u'': 27})
div_6 left margin stats: Counter({u'': 42505})
div_6 right margin stats: Counter({u'': 42505})
p_6 left margin stats: Counter({u'0': 65})
p_6 right margin stats: Counter({u'0': 65})
p_3 left margin stats: Counter({u'0': 1071})
p_3 right margin stats: Counter({u'0': 1071})
p_1 left margin stats: Counter({u'0': 1336})
p_1 right margin stats: Counter({u'0': 1336})
Cleaning up manifest...
Trimming unused files from manifest...
Creating EPUB Output...
Rescaling image from 861x1159 to 558x751 06-palestina-vechiului-testament.jpg
Rescaling image from 945x613 to 566x367 01-vechiul-orient.jpg
Rescaling image from 1722x958 to 566x315 05-calatoria-captivitatii-apostolului-pavel.jpg
Rescaling image from 1732x2376 to 547x751 03-ierusalimul-noului-testament.jpg
Rescaling image from 1704x1278 to 566x425 04-calatoriile-misionare-ale-apostolului-pavel.jpg
Rescaling image from 1749x2370 to 554x751 07-palestina-noului-testament.jpg
Looking for large trees in _allhtm.htm...
Found large tree #0
Splitting...
Split point: {http://www.w3.org/1999/xhtml}div /*/*[2]/*[720]
Python function terminated unexpectedly
(Error Code: 1)
Traceback (most recent call last):
File "site.py", line 132, in main
File "site.py", line 109, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 187, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 31, in gui_convert_override
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 1087, in run
File "site-packages\calibre\ebooks\epub\output.py", line 169, in convert
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 57, in __call__
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 67, in split_item
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 205, in __init__
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 406, in split_to_size
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 27, in tostring
File "lxml.etree.pyx", line 2860, in lxml.etree.tostring (src/lxml/lxml.etree.c:53681)
File "serializer.pxi", line 139, in lxml.etree._tostring (src/lxml/lxml.etree.c:87439)
MemoryError


I am completely out of ideas... I think I have found the book which is best suited for making calibre crash

Here is the book (it's in romanian, but I think this doesn't matter, if you want to see how clean the html is...):
a) Book before I import in calibre: HERE - But it will take few hours to import
b) Book as it appears in the Calibre repository: HERE (This is the one I tried to export in various formats: epub, moby, epub without split).

Note: There are some places where the characters are non-ascii (in around 20 words across the 20 Mb), but never caused any issue.

If anyone can give some help/ideas on what I'm doing wrong or what else I should try, or review the html/zip above, please let me know.
aplicatii.ro is offline   Reply With Quote
Old 11-30-2011, 04:40 AM   #10
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
I think if a document is to complex, or has too many cross-links (which I am guessing that this document might have) then I think that it may be beyond Calibre's ability to covert without running out of memory. Note that it is complexity that appears to cause the memory problem - not simpy document size. Calibre has been optimised to handle the vast majority of conversions that are of far simpler documents, but the consequnces are that it can fail on very complex ones. You may therefore simply be fighting a losing battle in trying to convert this particular book with Calibre.

As an aside you mentioned increasing the split value. If you do not keep it below 300KB then it is likely to fail on the vast majority of reading devices even if it did appear to successfully convert.
itimpi is offline   Reply With Quote
Old 11-30-2011, 12:05 PM   #11
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by aplicatii.ro View Post


It seems that regardless what I try to do, I reach in out of memory.
I have removed all css, cleaned up all I could from the html, still fails in completely different places with out of mem.

E.g. now, after many hours of work, it crashed with:
Anyone has any suggestion?
Create a .mobi file using Mobipocket Creator instead of calibre. Then use that with FBReader. You can even convert it to support simple dictionary look-ups as discussed here and here.

I've done exactly what you have tried with calibre with my Webster's Dictionary 1913 and similarly reached memory limitations on most conversions..
nrapallo is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
AndroidGuys: Android Market Will Hit 10K Apps kjk Android Devices 4 07-15-2010 10:18 PM
iPad Oprah Gives O Magazine Staff $10K and a Kindle! kjk Apple Devices 12 06-23-2010 01:57 PM
Splitting the Bible into Multiple Files SciFiGal777 Ectaco jetBook 3 03-27-2010 09:35 PM
Black Mask 10k ebook DVD Nate the great Feedback 6 08-07-2007 04:22 AM
Olive Tree Bible Software Releases Ryrie Study Bible Notes for Palm OS and Pocket PC Olive Tree News 1 03-05-2007 01:44 PM


All times are GMT -4. The time now is 12:08 PM.


MobileRead.com is a privately owned, operated and funded community.