View Single Post
Old 09-16-2010, 01:47 PM   #1
rheostaticsfan
Zealot
rheostaticsfan will become famous soon enoughrheostaticsfan will become famous soon enoughrheostaticsfan will become famous soon enoughrheostaticsfan will become famous soon enoughrheostaticsfan will become famous soon enoughrheostaticsfan will become famous soon enough
 
Posts: 104
Karma: 591
Join Date: May 2008
Device: kindle, iOS, Blackberry, Sony DPT (pdfs)
conversion error

I am trying to convert a book I got as a pdf. I first ocred/saved it as a .doc then I manually stripped the header and footer information and fixed ocr erros. I then used mobipocket creator to make a .prc file.

That .prc opens fine in mobipocket desktop and all looks good. I imported to calibre and am now trying to convert to epub. I get the following error:

ERROR: Conversion Error: <b>Failed</b>: Convert book 1 of 1 (The Yiddish Policeman's Union)

Convert book 1 of 1 (XXXXXXXXXXXXXXXXXX)
Resolved conversion options
calibre version: 0.7.18
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0.0,
'book_producer': None,
'change_justification': u'original',
'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|section|part\\s+', 'i')) or @class = 'chapter']",
'chapter_mark': u'pagebreak',
'comments': None,
'cover': 'c:\\docume~1\\XXX\\locals~1\\temp\\calibre_0.7.18 _tmp_pduwsy\\calibre_0.7.18_6qfgxh.jpeg',
'debug_pipeline': u'C:/Documents and Settings/XXX/Desktop/temp',
'disable_font_rescaling': False,
'dont_split_on_page_breaks': False,
'extra_css': None,
'extract_to': None,
'flow_size': 260,
'font_size_mapping': None,
'footer_regex': u'(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?\\d+<br>\\s*.*?\\s*)|(\\s* <a name=\\d+></a>((<img.+?>)*<br>\\s*)?.*?<br>\\s*\\d+))(?=<br>)' ,
'header_regex': u'(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?\\d+<br>\\s*.*?\\s*)|(\\s* <a name=\\d+></a>((<img.+?>)*<br>\\s*)?.*?<br>\\s*\\d+))(?=<br>)' ,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x03C87B90>,
'insert_blank_line': False,
'insert_metadata': False,
'isbn': None,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0.0,
'linearize_tables': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'max_toc_links': 50,
'no_chapters_in_toc': False,
'no_default_epub_cover': False,
'no_inline_navbars': False,
'no_svg_cover': False,
'output_profile': <calibre.customize.profiles.CybookG3Output object at 0x03C87D30>,
'page_breaks_before': u"//*[name()='h1' or name()='h2']",
'prefer_metadata_cover': False,
'preprocess_html': False,
'preserve_cover_aspect_ratio': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': 'c:\\docume~1\\moira\\locals~1\\temp\\calibre_0.7. 18_tmp_pduwsy\\calibre_0.7.18_tv_wxv.opf',
'remove_first_image': False,
'remove_footer': False,
'remove_header': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'series': None,
'series_index': None,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: MOBI Input running
on D:\personal\My Sugarsync\Calibre library\XXX\XXX (519)\XXXXXX.prc
Extracting text...
Adding anchors...
Extracting images...
Cleaning up HTML...
Parsing HTML...
Malformed markup, parsing using BeautifulSoup
MOBI markup appears to contain random bytes. Stripping.
Extracting text...
Adding anchors...
Extracting images...
Cleaning up HTML...
Parsing HTML...
Malformed markup, parsing using BeautifulSoup
MOBI markup appears to contain random bytes. Stripping.
Python function terminated unexpectedly
All strings must be XML compatible: Unicode or ASCII, no NULL bytes (Error Code: 1)
Traceback (most recent call last):
File "site.py", line 103, in main
File "site.py", line 85, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 99, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 24, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 815, in run
File "site-packages\calibre\customize\conversion.py", line 211, in __call__
File "site-packages\calibre\ebooks\mobi\input.py", line 27, in convert
File "site-packages\calibre\ebooks\mobi\reader.py", line 333, in extract_content
File "site-packages\lxml\html\soupparser.py", line 23, in fromstring
File "site-packages\lxml\html\soupparser.py", line 67, in _parse
File "site-packages\lxml\html\soupparser.py", line 77, in _convert_tree
File "site-packages\lxml\html\soupparser.py", line 87, in _convert_children
File "site-packages\lxml\html\soupparser.py", line 87, in _convert_children
File "site-packages\lxml\html\soupparser.py", line 87, in _convert_children
File "site-packages\lxml\html\soupparser.py", line 87, in _convert_children
File "site-packages\lxml\html\soupparser.py", line 87, in _convert_children
File "site-packages\lxml\html\soupparser.py", line 87, in _convert_children
File "site-packages\lxml\html\soupparser.py", line 87, in _convert_children
File "site-packages\lxml\html\soupparser.py", line 89, in _convert_children
File "site-packages\lxml\html\soupparser.py", line 101, in _append_text
File "lxml.etree.pyx", line 821, in lxml.etree._Element.text.__set__ (src/lxml/lxml.etree.c:32944)
File "apihelpers.pxi", line 645, in lxml.etree._setNodeText (src/lxml/lxml.etree.c:15265)
File "apihelpers.pxi", line 1242, in lxml.etree._utf8 (src/lxml/lxml.etree.c:19848)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes


What can I do now?
rheostaticsfan is offline   Reply With Quote