Conversion to Mobi to ePub errors

erik_reader · 08-03-2010, 05:54 AM

I've been successfully using Calibre to convert my various eBooks over to ePub for Stanza with great success up until one book which errors out with the following message - any tips or should I file a bug report? The mobi file opens fine with the Stanza desktop application and (mostly fine) with the Calibre eBook reader. Currently running 0.7.12 under 64-bit Ubuntu.

ERROR: Conversion Error: Failed: Convert book 1 of 1 (Tongues of Serpents)

Convert book 1 of 1 (Tongues of Serpents)
Resolved conversion options
calibre version: 0.7.12
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0.0,
'book_producer': None,
'change_justification': u'original',
'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|section|part\\s+', 'i')) or @class = 'chapter']",
'chapter_mark': u'pagebreak',
'comments': None,
'cover': '/tmp/calibre_0.7.12_KJ9fhK.jpeg',
'debug_pipeline': None,
'disable_font_rescaling': False,
'dont_split_on_page_breaks': False,
'extra_css': None,
'extract_to': None,
'flow_size': 260,
'font_size_mapping': None,
'footer_regex': u'(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)* \\s*)?\\d+ \\s*.*?\\s*)|(\\s* <a name=\\d+></a>((<img.+?>)* \\s*)?.*? \\s*\\d+))(?= )' ,
'header_regex': u'(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)* \\s*)?\\d+ \\s*.*?\\s*)|(\\s* <a name=\\d+></a>((<img.+?>)* \\s*)?.*? \\s*\\d+))(?= )' ,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x35f5d90>,
'insert_blank_line': False,
'insert_metadata': False,
'isbn': None,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0.0,
'linearize_tables': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'max_toc_links': 50,
'no_chapters_in_toc': False,
'no_default_epub_cover': False,
'no_inline_navbars': False,
'no_svg_cover': False,
'output_profile': <calibre.customize.profiles.OutputProfile object at 0x35f8190>,
'page_breaks_before': u"//*[name()='h1' or name()='h2']",
'prefer_metadata_cover': False,
'preprocess_html': False,
'preserve_cover_aspect_ratio': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': '/tmp/calibre_0.7.12_YL0Cia.opf',
'remove_first_image': False,
'remove_footer': False,
'remove_header': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'series': None,
'series_index': None,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: MOBI Input running
on /home/erik/Dropbox/CalibreBookshelf/Naomi Novik/Tongues of Serpents (574)/Tongues of Serpents - Naomi Novik.mobi
Extracting text...
Adding anchors...
Extracting images...
Cleaning up HTML...
Parsing HTML...
Malformed markup, parsing using BeautifulSoup
MOBI markup appears to contain random bytes. Stripping.
Extracting text...
Adding anchors...
Extracting images...
Cleaning up HTML...
Parsing HTML...
Malformed markup, parsing using BeautifulSoup
MOBI markup appears to contain random bytes. Stripping.
Traceback (most recent call last):
File "/tmp/init.py", line 48, in <module>
File "/home/kovid/build/calibre/src/calibre/utils/ipc/worker.py", line 99, in main
File "/home/kovid/build/calibre/src/calibre/gui2/convert/gui_conversion.py", line 24, in gui_convert
File "/home/kovid/build/calibre/src/calibre/ebooks/conversion/plumber.py", line 815, in run
File "/home/kovid/build/calibre/src/calibre/customize/conversion.py", line 207, in __call__
File "/home/kovid/build/calibre/src/calibre/ebooks/mobi/input.py", line 27, in convert
File "/home/kovid/build/calibre/src/calibre/ebooks/mobi/reader.py", line 333, in extract_content
File "/usr/lib64/python2.6/site-packages/lxml/html/soupparser.py", line 23, in fromstring
File "/usr/lib64/python2.6/site-packages/lxml/html/soupparser.py", line 66, in _parse
File "/usr/lib64/python2.6/site-packages/BeautifulSoup.py", line 1499, in __init__
File "/usr/lib64/python2.6/site-packages/BeautifulSoup.py", line 1230, in __init__
File "/usr/lib64/python2.6/site-packages/BeautifulSoup.py", line 1263, in _feed
File "/usr/lib64/python2.6/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib64/python2.6/HTMLParser.py", line 148, in goahead
k = self.parse_starttag(i)
File "/usr/lib64/python2.6/HTMLParser.py", line 226, in parse_starttag
endpos = self.check_for_whole_start_tag(i)
File "/usr/lib64/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag
self.error("malformed start tag")
File "/usr/lib64/python2.6/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 23, column 116

kovidgoyal · 08-03-2010, 11:49 AM

That error indicates a damaged file. Usually, damaged during DRM removal. Since the calibre viewer and the converter use the same code, I don't see how it is possible for the converter but not the viewer to work.

erik_reader · 08-06-2010, 01:17 AM

Since I can read the file in several readers, including Calibre, but there's obviously an error blocking the conversion process.

Possibly an issue with extended character encoding in the book ? Would you like to see the file for testing?

Carlj · 08-07-2010, 01:43 AM

I'm wondering why a novel I paid for in B&N and then transferred to Kindle has certain type-set errors (i.e. pol liti cle, de vil, Eng lish, etc) intermettently thoughout the document? There was discussion of errors from DRM removal. But it's not that.

This same annoying problem is not only noticed in the MOBI document on Kindle2 and Caliber; I also rechecked the original EPUB document, searched for an error word like "po liti cal" and there it was! So this is what B&N originally sent me?

Has anyone else had this problem with downloaded files they bought? Is it common or just B&N?

DoctorOhh · 08-07-2010, 01:58 AM

Quote:

Originally Posted by Carlj

This same annoying problem is not only noticed in the MOBI document on Kindle2 and Caliber; I also rechecked the original EPUB document, searched for an error word like "po liti cal" and there it was! So this is what B&N originally sent me?

I don't believe that B&N formats the book, the publisher formats the book. I would write the publisher, B&N, and the author with the details and your dissatisfaction with the product.

JSWolf · 08-07-2010, 02:03 AM

If you wanted ePub, why did you purchase Mobipocket?

08-03-2010, 05:54 AM	#1
erik_reader Junior Member Posts: 2 Karma: 10 Join Date: Aug 2010 Device: Stanza	Conversion to Mobi to ePub errors I've been successfully using Calibre to convert my various eBooks over to ePub for Stanza with great success up until one book which errors out with the following message - any tips or should I file a bug report? The mobi file opens fine with the Stanza desktop application and (mostly fine) with the Calibre eBook reader. Currently running 0.7.12 under 64-bit Ubuntu. ERROR: Conversion Error: <b>Failed</b>: Convert book 1 of 1 (Tongues of Serpents) Convert book 1 of 1 (Tongues of Serpents) Resolved conversion options calibre version: 0.7.12 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0.0, 'book_producer': None, 'change_justification': u'original', 'chapter': u"//[((name()='h1' or name()='h2') and re:test(., 'chapter\|book\|section\|part\\s+', 'i')) or @class = 'chapter']", 'chapter_mark': u'pagebreak', 'comments': None, 'cover': '/tmp/calibre_0.7.12_KJ9fhK.jpeg', 'debug_pipeline': None, 'disable_font_rescaling': False, 'dont_split_on_page_breaks': False, 'extra_css': None, 'extract_to': None, 'flow_size': 260, 'font_size_mapping': None, 'footer_regex': u'(?i)(?<=<hr>)((\\s<a name=\\d+></a>((<img.+?>)<br>\\s)?\\d+<br>\\s.?\\s)\|(\\s <a name=\\d+></a>((<img.+?>)<br>\\s)?.?<br>\\s\\d+))(?=<br>)' , 'header_regex': u'(?i)(?<=<hr>)((\\s<a name=\\d+></a>((<img.+?>)<br>\\s)?\\d+<br>\\s.?\\s)\|(\\s* <a name=\\d+></a>((<img.+?>)<br>\\s)?.?<br>\\s\\d+))(?=<br>)' , 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x35f5d90>, 'insert_blank_line': False, 'insert_metadata': False, 'isbn': None, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0.0, 'linearize_tables': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'max_toc_links': 50, 'no_chapters_in_toc': False, 'no_default_epub_cover': False, 'no_inline_navbars': False, 'no_svg_cover': False, 'output_profile': <calibre.customize.profiles.OutputProfile object at 0x35f8190>, 'page_breaks_before': u"//*[name()='h1' or name()='h2']", 'prefer_metadata_cover': False, 'preprocess_html': False, 'preserve_cover_aspect_ratio': False, 'pretty_print': True, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': '/tmp/calibre_0.7.12_YL0Cia.opf', 'remove_first_image': False, 'remove_footer': False, 'remove_header': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'series': None, 'series_index': None, 'tags': None, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'use_auto_toc': False, 'verbose': 2} InputFormatPlugin: MOBI Input running on /home/erik/Dropbox/CalibreBookshelf/Naomi Novik/Tongues of Serpents (574)/Tongues of Serpents - Naomi Novik.mobi Extracting text... Adding anchors... Extracting images... Cleaning up HTML... Parsing HTML... Malformed markup, parsing using BeautifulSoup MOBI markup appears to contain random bytes. Stripping. Extracting text... Adding anchors... Extracting images... Cleaning up HTML... Parsing HTML... Malformed markup, parsing using BeautifulSoup MOBI markup appears to contain random bytes. Stripping. Traceback (most recent call last): File "/tmp/init.py", line 48, in <module> File "/home/kovid/build/calibre/src/calibre/utils/ipc/worker.py", line 99, in main File "/home/kovid/build/calibre/src/calibre/gui2/convert/gui_conversion.py", line 24, in gui_convert File "/home/kovid/build/calibre/src/calibre/ebooks/conversion/plumber.py", line 815, in run File "/home/kovid/build/calibre/src/calibre/customize/conversion.py", line 207, in __call__ File "/home/kovid/build/calibre/src/calibre/ebooks/mobi/input.py", line 27, in convert File "/home/kovid/build/calibre/src/calibre/ebooks/mobi/reader.py", line 333, in extract_content File "/usr/lib64/python2.6/site-packages/lxml/html/soupparser.py", line 23, in fromstring File "/usr/lib64/python2.6/site-packages/lxml/html/soupparser.py", line 66, in _parse File "/usr/lib64/python2.6/site-packages/BeautifulSoup.py", line 1499, in __init__ File "/usr/lib64/python2.6/site-packages/BeautifulSoup.py", line 1230, in __init__ File "/usr/lib64/python2.6/site-packages/BeautifulSoup.py", line 1263, in _feed File "/usr/lib64/python2.6/HTMLParser.py", line 108, in feed self.goahead(0) File "/usr/lib64/python2.6/HTMLParser.py", line 148, in goahead k = self.parse_starttag(i) File "/usr/lib64/python2.6/HTMLParser.py", line 226, in parse_starttag endpos = self.check_for_whole_start_tag(i) File "/usr/lib64/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag self.error("malformed start tag") File "/usr/lib64/python2.6/HTMLParser.py", line 115, in error raise HTMLParseError(message, self.getpos()) HTMLParser.HTMLParseError: malformed start tag, at line 23, column 116

08-06-2010, 01:17 AM	#3
erik_reader Junior Member Posts: 2 Karma: 10 Join Date: Aug 2010 Device: Stanza	Definitely odd then... Since I can read the file in several readers, including Calibre, but there's obviously an error blocking the conversion process. Possibly an issue with extended character encoding in the book ? Would you like to see the file for testing?

08-07-2010, 01:43 AM	#4
Carlj Junior Member Posts: 6 Karma: 10 Join Date: Aug 2010 Device: Kindle	Type Errors in un DRM'd conversions I'm wondering why a novel I paid for in B&N and then transferred to Kindle has certain type-set errors (i.e. pol liti cle, de vil, Eng lish, etc) intermettently thoughout the document? There was discussion of errors from DRM removal. But it's not that. This same annoying problem is not only noticed in the MOBI document on Kindle2 and Caliber; I also rechecked the original EPUB document, searched for an error word like "po liti cal" and there it was! So this is what B&N originally sent me? Has anyone else had this problem with downloaded files they bought? Is it common or just B&N?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
epub to mobi conversion problem.	lutwey	Calibre	0	09-18-2010 11:51 AM
epub to mobi conversion problems	rcdc	Calibre	5	09-18-2010 02:29 AM
ePub to Mobi Conversion Quality	Logiedan	Calibre	8	08-17-2010 04:02 PM
Epub to Mobi conversion	MichaelGray	Calibre	2	08-12-2010 01:08 PM
How good is Epub -> Mobi conversion really?	AdrianM	Amazon Kindle	26	07-31-2010 11:04 PM

08-03-2010, 11:49 AM	#2
kovidgoyal creator of calibre Posts: 44,350 Karma: 23661992 Join Date: Oct 2006 Location: Mumbai, India Device: Various	That error indicates a damaged file. Usually, damaged during DRM removal. Since the calibre viewer and the converter use the same code, I don't see how it is possible for the converter but not the viewer to work.

08-07-2010, 02:03 AM	#6
JSWolf Resident Curmudgeon Posts: 75,892 Karma: 134368292 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3	If you wanted ePub, why did you purchase Mobipocket?

Advert

Advert