MobileRead Forums - View Single Post

myusername · 09-02-2010, 12:42 AM

I'm trying to convert a PRC document to EPUB and I'm getting the following error. Anyone have any ideas what is causing this to happen? Is there any way to put some debugging in there to find out exactly which strings are causing the "All strings must be XML compatible" error? This document is fairly large so I'm not even sure where to start looking. Any help would be very much appreciated.

ERROR: Conversion Error: Failed: Convert book 1 of 1 (2011ef)

Convert book 1 of 1 (2011ef)
Resolved conversion options
calibre version: 0.7.16
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0.0,
'book_producer': None,
'change_justification': u'original',
'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|section|part\\s+', 'i')) or @class = 'chapter']",
'chapter_mark': u'pagebreak',
'comments': None,
'cover': 'c:\\users\\cg\\appdata\\local\\temp\\calibre_0.7. 16_tmp_gtmwgm\\calibre_0.7.16_bxxkp9.jpeg',
'debug_pipeline': None,
'disable_font_rescaling': False,
'dont_split_on_page_breaks': False,
'extra_css': None,
'extract_to': None,
'flow_size': 260,
'font_size_mapping': None,
'footer_regex': u'(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)* \\s*)?\\d+ \\s*.*?\\s*)|(\\s* <a name=\\d+></a>((<img.+?>)* \\s*)?.*? \\s*\\d+))(?= )' ,
'header_regex': u'(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)* \\s*)?\\d+ \\s*.*?\\s*)|(\\s* <a name=\\d+></a>((<img.+?>)* \\s*)?.*? \\s*\\d+))(?= )' ,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x04F70C30>,
'insert_blank_line': False,
'insert_metadata': False,
'isbn': None,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0.0,
'linearize_tables': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'max_toc_links': 50,
'no_chapters_in_toc': False,
'no_default_epub_cover': False,
'no_inline_navbars': False,
'no_svg_cover': False,
'output_profile': <calibre.customize.profiles.OutputProfile object at 0x04F70E10>,
'page_breaks_before': u"//*[name()='h1' or name()='h2']",
'prefer_metadata_cover': False,
'preprocess_html': False,
'preserve_cover_aspect_ratio': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': 'c:\\users\\cg\\appdata\\local\\temp\\calibre_0.7. 16_tmp_gtmwgm\\calibre_0.7.16__labo5.opf',
'remove_first_image': False,
'remove_footer': False,
'remove_header': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'series': None,
'series_index': None,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: MOBI Input running
on C:\ef\mps\2011ef.prc
Extracting text...
Adding anchors...
Extracting images...
Cleaning up HTML...
Parsing HTML...
Malformed markup, parsing using BeautifulSoup
MOBI markup appears to contain random bytes. Stripping.
Extracting text...
Adding anchors...
Extracting images...
Cleaning up HTML...
Parsing HTML...
Malformed markup, parsing using BeautifulSoup
MOBI markup appears to contain random bytes. Stripping.
Python function terminated unexpectedly
All strings must be XML compatible: Unicode or ASCII, no NULL bytes (Error Code: 1)
Traceback (most recent call last):
File "site.py", line 103, in main
File "site.py", line 85, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 99, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 24, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 815, in run
File "site-packages\calibre\customize\conversion.py", line 211, in __call__
File "site-packages\calibre\ebooks\mobi\input.py", line 27, in convert
File "site-packages\calibre\ebooks\mobi\reader.py", line 333, in extract_content
File "site-packages\lxml\html\soupparser.py", line 23, in fromstring
File "site-packages\lxml\html\soupparser.py", line 67, in _parse
File "site-packages\lxml\html\soupparser.py", line 77, in _convert_tree
File "site-packages\lxml\html\soupparser.py", line 87, in _convert_children
File "site-packages\lxml\html\soupparser.py", line 87, in _convert_children
File "site-packages\lxml\html\soupparser.py", line 87, in _convert_children
File "site-packages\lxml\html\soupparser.py", line 89, in _convert_children
File "site-packages\lxml\html\soupparser.py", line 103, in _append_text
File "lxml.etree.pyx", line 836, in lxml.etree._Element.tail.__set__ (src/lxml/lxml.etree.c:33020)
File "apihelpers.pxi", line 667, in lxml.etree._setTailText (src/lxml/lxml.etree.c:15438)
File "apihelpers.pxi", line 1242, in lxml.etree._utf8 (src/lxml/lxml.etree.c:19848)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes

09-02-2010, 12:42 AM	#1
myusername Junior Member Posts: 5 Karma: 10 Join Date: Sep 2010 Device: Kindle	Error on PRC > EPUB conversion I'm trying to convert a PRC document to EPUB and I'm getting the following error. Anyone have any ideas what is causing this to happen? Is there any way to put some debugging in there to find out exactly which strings are causing the "All strings must be XML compatible" error? This document is fairly large so I'm not even sure where to start looking. Any help would be very much appreciated. ERROR: Conversion Error: <b>Failed</b>: Convert book 1 of 1 (2011ef) Convert book 1 of 1 (2011ef) Resolved conversion options calibre version: 0.7.16 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0.0, 'book_producer': None, 'change_justification': u'original', 'chapter': u"//[((name()='h1' or name()='h2') and re:test(., 'chapter\|book\|section\|part\\s+', 'i')) or @class = 'chapter']", 'chapter_mark': u'pagebreak', 'comments': None, 'cover': 'c:\\users\\cg\\appdata\\local\\temp\\calibre_0.7. 16_tmp_gtmwgm\\calibre_0.7.16_bxxkp9.jpeg', 'debug_pipeline': None, 'disable_font_rescaling': False, 'dont_split_on_page_breaks': False, 'extra_css': None, 'extract_to': None, 'flow_size': 260, 'font_size_mapping': None, 'footer_regex': u'(?i)(?<=<hr>)((\\s<a name=\\d+></a>((<img.+?>)<br>\\s)?\\d+<br>\\s.?\\s)\|(\\s <a name=\\d+></a>((<img.+?>)<br>\\s)?.?<br>\\s\\d+))(?=<br>)' , 'header_regex': u'(?i)(?<=<hr>)((\\s<a name=\\d+></a>((<img.+?>)<br>\\s)?\\d+<br>\\s.?\\s)\|(\\s* <a name=\\d+></a>((<img.+?>)<br>\\s)?.?<br>\\s\\d+))(?=<br>)' , 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x04F70C30>, 'insert_blank_line': False, 'insert_metadata': False, 'isbn': None, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0.0, 'linearize_tables': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'max_toc_links': 50, 'no_chapters_in_toc': False, 'no_default_epub_cover': False, 'no_inline_navbars': False, 'no_svg_cover': False, 'output_profile': <calibre.customize.profiles.OutputProfile object at 0x04F70E10>, 'page_breaks_before': u"//*[name()='h1' or name()='h2']", 'prefer_metadata_cover': False, 'preprocess_html': False, 'preserve_cover_aspect_ratio': False, 'pretty_print': True, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': 'c:\\users\\cg\\appdata\\local\\temp\\calibre_0.7. 16_tmp_gtmwgm\\calibre_0.7.16__labo5.opf', 'remove_first_image': False, 'remove_footer': False, 'remove_header': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'series': None, 'series_index': None, 'tags': None, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'use_auto_toc': False, 'verbose': 2} InputFormatPlugin: MOBI Input running on C:\ef\mps\2011ef.prc Extracting text... Adding anchors... Extracting images... Cleaning up HTML... Parsing HTML... Malformed markup, parsing using BeautifulSoup MOBI markup appears to contain random bytes. Stripping. Extracting text... Adding anchors... Extracting images... Cleaning up HTML... Parsing HTML... Malformed markup, parsing using BeautifulSoup MOBI markup appears to contain random bytes. Stripping. Python function terminated unexpectedly All strings must be XML compatible: Unicode or ASCII, no NULL bytes (Error Code: 1) Traceback (most recent call last): File "site.py", line 103, in main File "site.py", line 85, in run_entry_point File "site-packages\calibre\utils\ipc\worker.py", line 99, in main File "site-packages\calibre\gui2\convert\gui_conversion.py", line 24, in gui_convert File "site-packages\calibre\ebooks\conversion\plumber.py", line 815, in run File "site-packages\calibre\customize\conversion.py", line 211, in __call__ File "site-packages\calibre\ebooks\mobi\input.py", line 27, in convert File "site-packages\calibre\ebooks\mobi\reader.py", line 333, in extract_content File "site-packages\lxml\html\soupparser.py", line 23, in fromstring File "site-packages\lxml\html\soupparser.py", line 67, in _parse File "site-packages\lxml\html\soupparser.py", line 77, in _convert_tree File "site-packages\lxml\html\soupparser.py", line 87, in _convert_children File "site-packages\lxml\html\soupparser.py", line 87, in _convert_children File "site-packages\lxml\html\soupparser.py", line 87, in _convert_children File "site-packages\lxml\html\soupparser.py", line 89, in _convert_children File "site-packages\lxml\html\soupparser.py", line 103, in _append_text File "lxml.etree.pyx", line 836, in lxml.etree._Element.tail.__set__ (src/lxml/lxml.etree.c:33020) File "apihelpers.pxi", line 667, in lxml.etree._setTailText (src/lxml/lxml.etree.c:15438) File "apihelpers.pxi", line 1242, in lxml.etree._utf8 (src/lxml/lxml.etree.c:19848) ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes