Cannot Convert HTML to RTF

LightGuard · 06-27-2010, 05:39 AM

I've got several HTML files that I'm trying to convert to RTF files. Why? The eReader I'm using (Freda) isn't working as well as Mobile Office Word to read files.

I was able to do this with 0.7.4, and it's not just HTML files with multiple chapters, it's also affecting single chapters as well.

This is the output I receive in the error message for a multiple chapter story:
ERROR: Conversion Error: Failed: Convert book 1 of 1 (Weapon of Mass Destruction)

Convert book 1 of 1 (Weapon of Mass Destruction)
Processing archive...
Resolved conversion options
calibre version: 0.7.5
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 12.0,
'book_producer': None,
'breadth_first': False,
'change_justification': u'original',
'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|section|part\\s+', 'i')) or @class = 'chapter']",
'chapter_mark': u'pagebreak',
'comments': None,
'cover': None,
'debug_pipeline': None,
'disable_font_rescaling': False,
'dont_package': False,
'dont_split_on_page_breaks': False,
'extra_css': None,
'extract_to': None,
'flow_size': 260,
'font_size_mapping': u'5.0, 7.0, 9.0, 12.0, 13.5, 17.0, 20.0, 22.0, 24.0',
'footer_regex': u'(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)* \\s*)?\\d+ \\s*.*?\\s*)|(\\s* <a name=\\d+></a>((<img.+?>)* \\s*)?.*? \\s*\\d+))(?= )' ,
'header_regex': u'(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)* \\s*)?\\d+ \\s*.*?\\s*)|(\\s* <a name=\\d+></a>((<img.+?>)* \\s*)?.*? \\s*\\d+))(?= )' ,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x03AAAA10>,
'insert_blank_line': False,
'insert_metadata': False,
'isbn': None,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0.0,
'linearize_tables': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'max_levels': 5,
'max_toc_links': 50,
'no_chapters_in_toc': False,
'no_default_epub_cover': False,
'no_inline_navbars': False,
'no_svg_cover': False,
'output_profile': <calibre.customize.profiles.OutputProfile object at 0x03AAABF0>,
'page_breaks_before': u'/',
'prefer_metadata_cover': False,
'preprocess_html': False,
'preserve_cover_aspect_ratio': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': 'c:\\windows\\temp\\calibre_0.7.5_uaz92u.opf',
'remove_first_image': False,
'remove_footer': False,
'remove_header': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'series': None,
'series_index': None,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'unwrap_factor': 0.0,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: HTML Input running
on c:\windows\temp\calibre_0.7.5_ewxexo_plumber\conte nt.opf
Parsing all content...
Manifest item 'toc.ncx' not found
Parsing 002.html ...
Parsing 003.html ...
Parsing 007.html ...
Parsing 012.html ...
Parsing 013.html ...
Parsing 005.html ...
Parsing 006.html ...
Parsing 014.html ...
Parsing 001.html ...
Parsing 004.html ...
Parsing index.html ...
Merging multiple <head> and <body> sections
Parsing 008.html ...
Parsing 011.html ...
Parsing 009.html ...
Parsing 010.html ...
Generating default TOC from spine...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 3 entries.
Flattening CSS and remapping font sizes...
Python function terminated unexpectedly
maximum recursion depth exceeded in cmp (Error Code: 1)
Traceback (most recent call last):
File "site.py", line 103, in main
File "site.py", line 85, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 99, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 24, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 898, in run
File "site-packages\calibre\ebooks\oeb\transforms\flatcss.py" , line 123, in __call__
File "site-packages\calibre\ebooks\oeb\transforms\flatcss.py" , line 166, in baseline_spine
File "site-packages\calibre\ebooks\oeb\transforms\flatcss.py" , line 155, in baseline_node
<Above line repeated nearly ad infinitum. Deleted excess for brevity>
File "site-packages\calibre\ebooks\oeb\transforms\flatcss.py" , line 151, in baseline_node
File "site-packages\calibre\ebooks\oeb\stylizer.py", line 442, in __getitem__
File "site-packages\calibre\ebooks\oeb\stylizer.py", line 452, in _get
File "site-packages\calibre\ebooks\oeb\stylizer.py", line 452, in _get
File "site-packages\calibre\ebooks\oeb\stylizer.py", line 448, in _get
RuntimeError: maximum recursion depth exceeded in cmp

I found something about updating cssutils at some point in my Google-Fu search. Is this something I need to do?

Using Windows XP SP3 and Calibre 0.7.5 on a factory Dell Optiplex GX270. HTML files used were generated by FanfictionUpdater 0.6 C2 downloading from Mediaminer.org.

kovidgoyal · 06-27-2010, 10:37 AM

There's a bug in 0.7.5 html input, stay with 0.7.4 until 0.7.6 is released.

06-27-2010, 05:39 AM	#1
LightGuard Junior Member Posts: 1 Karma: 10 Join Date: Jun 2010 Device: HTC Touch Pro2	Cannot Convert HTML to RTF I've got several HTML files that I'm trying to convert to RTF files. Why? The eReader I'm using (Freda) isn't working as well as Mobile Office Word to read files. I was able to do this with 0.7.4, and it's not just HTML files with multiple chapters, it's also affecting single chapters as well. This is the output I receive in the error message for a multiple chapter story: ERROR: Conversion Error: <b>Failed</b>: Convert book 1 of 1 (Weapon of Mass Destruction) Convert book 1 of 1 (Weapon of Mass Destruction) Processing archive... Resolved conversion options calibre version: 0.7.5 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 12.0, 'book_producer': None, 'breadth_first': False, 'change_justification': u'original', 'chapter': u"//[((name()='h1' or name()='h2') and re:test(., 'chapter\|book\|section\|part\\s+', 'i')) or @class = 'chapter']", 'chapter_mark': u'pagebreak', 'comments': None, 'cover': None, 'debug_pipeline': None, 'disable_font_rescaling': False, 'dont_package': False, 'dont_split_on_page_breaks': False, 'extra_css': None, 'extract_to': None, 'flow_size': 260, 'font_size_mapping': u'5.0, 7.0, 9.0, 12.0, 13.5, 17.0, 20.0, 22.0, 24.0', 'footer_regex': u'(?i)(?<=<hr>)((\\s<a name=\\d+></a>((<img.+?>)<br>\\s)?\\d+<br>\\s.?\\s)\|(\\s <a name=\\d+></a>((<img.+?>)<br>\\s)?.?<br>\\s\\d+))(?=<br>)' , 'header_regex': u'(?i)(?<=<hr>)((\\s<a name=\\d+></a>((<img.+?>)<br>\\s)?\\d+<br>\\s.?\\s)\|(\\s* <a name=\\d+></a>((<img.+?>)<br>\\s)?.?<br>\\s\\d+))(?=<br>)' , 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x03AAAA10>, 'insert_blank_line': False, 'insert_metadata': False, 'isbn': None, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0.0, 'linearize_tables': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'max_levels': 5, 'max_toc_links': 50, 'no_chapters_in_toc': False, 'no_default_epub_cover': False, 'no_inline_navbars': False, 'no_svg_cover': False, 'output_profile': <calibre.customize.profiles.OutputProfile object at 0x03AAABF0>, 'page_breaks_before': u'/', 'prefer_metadata_cover': False, 'preprocess_html': False, 'preserve_cover_aspect_ratio': False, 'pretty_print': True, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': 'c:\\windows\\temp\\calibre_0.7.5_uaz92u.opf', 'remove_first_image': False, 'remove_footer': False, 'remove_header': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'series': None, 'series_index': None, 'tags': None, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'unwrap_factor': 0.0, 'use_auto_toc': False, 'verbose': 2} InputFormatPlugin: HTML Input running on c:\windows\temp\calibre_0.7.5_ewxexo_plumber\conte nt.opf Parsing all content... Manifest item 'toc.ncx' not found Parsing 002.html ... Parsing 003.html ... Parsing 007.html ... Parsing 012.html ... Parsing 013.html ... Parsing 005.html ... Parsing 006.html ... Parsing 014.html ... Parsing 001.html ... Parsing 004.html ... Parsing index.html ... Merging multiple <head> and <body> sections Parsing 008.html ... Parsing 011.html ... Parsing 009.html ... Parsing 010.html ... Generating default TOC from spine... Merging user specified metadata... Detecting structure... Auto generated TOC with 3 entries. Flattening CSS and remapping font sizes... Python function terminated unexpectedly maximum recursion depth exceeded in cmp (Error Code: 1) Traceback (most recent call last): File "site.py", line 103, in main File "site.py", line 85, in run_entry_point File "site-packages\calibre\utils\ipc\worker.py", line 99, in main File "site-packages\calibre\gui2\convert\gui_conversion.py", line 24, in gui_convert File "site-packages\calibre\ebooks\conversion\plumber.py", line 898, in run File "site-packages\calibre\ebooks\oeb\transforms\flatcss.py" , line 123, in __call__ File "site-packages\calibre\ebooks\oeb\transforms\flatcss.py" , line 166, in baseline_spine File "site-packages\calibre\ebooks\oeb\transforms\flatcss.py" , line 155, in baseline_node <Above line repeated nearly ad infinitum. Deleted excess for brevity> File "site-packages\calibre\ebooks\oeb\transforms\flatcss.py" , line 151, in baseline_node File "site-packages\calibre\ebooks\oeb\stylizer.py", line 442, in __getitem__ File "site-packages\calibre\ebooks\oeb\stylizer.py", line 452, in _get File "site-packages\calibre\ebooks\oeb\stylizer.py", line 452, in _get File "site-packages\calibre\ebooks\oeb\stylizer.py", line 448, in _get RuntimeError: maximum recursion depth exceeded in cmp I found something about updating cssutils at some point in my Google-Fu search. Is this something I need to do? Using Windows XP SP3 and Calibre 0.7.5 on a factory Dell Optiplex GX270. HTML files used were generated by FanfictionUpdater 0.6 C2 downloading from Mediaminer.org. Last edited by LightGuard; 06-27-2010 at 05:42 AM. Reason: Wall of Text must die.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
LRFTools. Convert LRF to EPUB, HTML, PDF and RTF	elinares	LRF	279	07-30-2011 11:48 PM
RTF into Sigil - need RTF->HTML converter	Daemon	Sigil	8	09-21-2010 03:56 PM
RTF vs HTML---best way to convert my files?	ficbot	Workshop	16	05-06-2010 06:05 PM
[Old Thread] unable to convert ebooks(rtf, txt,lit,html,pdf) to lrf in calibre .4.131	jackdeth191	Calibre	9	05-02-2009 02:55 AM
RTF -> HTML	Dave Berk	Workshop	12	09-06-2008 06:38 PM

06-27-2010, 10:37 AM	#2
kovidgoyal creator of calibre Posts: 43,853 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	There's a bug in 0.7.5 html input, stay with 0.7.4 until 0.7.6 is released.

Advert