'utf8' codec can't decode byte 0xb1 in position 18: invalid start byte

paul.westland · 11-06-2012, 11:49 AM

This is the error I am getting.

calibre, version 0.9.5 (win32, isfrozen: True)
Conversion Error: Failed: Convert book 1 of 1 (Means of Ascent)

Convert book 1 of 1 (Means of Ascent)
Resolved conversion options
calibre version: 0.9.5
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0.0,
'book_producer': None,
'change_justification': u'original',
'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., '\\s*((chapter|book|section|part)\\s+)|((prolog|pr ologue|epilogue)(\\s+|$))', 'i')) or @class = 'chapter']",
'chapter_mark': u'pagebreak',
'comments': None,
'cover': None,
'debug_pipeline': None,
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_compress': False,
'duplicate_links_in_toc': False,
'embed_font_family': None,
'enable_heuristics': False,
'extra_css': None,
'extract_to': None,
'filter_css': u'',
'fix_indents': True,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x0370C490>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0.0,
'linearize_tables': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_toc_links': 50,
'minimum_line_height': 120.0,
'mobi_file_type': u'old',
'mobi_ignore_margins': False,
'mobi_keep_original_images': False,
'mobi_toc_at_start': False,
'no_chapters_in_toc': False,
'no_inline_navbars': True,
'no_inline_toc': False,
'output_profile': <calibre.customize.profiles.KindleOutput object at 0x0370C7D0>,
'page_breaks_before': u'/',
'personal_doc': u'[PDOC]',
'prefer_author_sort': False,
'prefer_metadata_cover': False,
'pretty_print': False,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': u'C:\\Users\\m147146\\AppData\\Local\\Temp\\calibr e_0.9.5_tmp_of8z0m\\ltspdi.opf',
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': u'',
'search_replace': '[]',
'series': None,
'series_index': None,
'share_not_sync': False,
'smarten_punctuation': False,
'sr1_replace': None,
'sr1_search': None,
'sr2_replace': None,
'sr2_search': None,
'sr3_replace': None,
'sr3_search': None,
'start_reading_at': None,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'toc_title': None,
'unsmarten_punctuation': False,
'unwrap_lines': True,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: EPUB Input running
on C:\Users\m147146\AppData\Local\Temp\calibre_0.9.5_ tmp_of8z0m\ty0jpc.epub
Python function terminated unexpectedly
'utf8' codec can't decode byte 0xb1 in position 18: invalid start byte (Error Code: 1)
Traceback (most recent call last):
File "site.py", line 132, in main
File "site.py", line 109, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 186, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 31, in gui_convert_override
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 1000, in run
File "site-packages\calibre\customize\conversion.py", line 239, in __call__
File "site-packages\calibre\ebooks\conversion\plugins\epub_in put.py", line 153, in convert
File "site-packages\calibre\utils\zipfile.py", line 751, in __init__
File "site-packages\calibre\utils\zipfile.py", line 786, in _GetContents
File "site-packages\calibre\utils\zipfile.py", line 847, in _RealGetContents
File "site-packages\calibre\utils\zipfile.py", line 388, in _decodeFilename
File "encodings\utf_8.py", line 16, in decode
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb1 in position 18: invalid start byte

Any information I can get on how to fix it would be appreciated.

KevinH · 11-06-2012, 12:19 PM

Hi,

There are many broken epubs out there (especially from B&N)! These epubs do NOT follow the zip or epub specifications. Epubs are supposed to be zip files.

One form of breakage is to use garbage chars or full utf-16 unicode in the zip central directory filenames and then set the flag that indicates the names are utf-8 encoded. Another form of breakage is to not have the zip central directory filename match the the local filename and most zip access programs use the broken central directory name over the local name to prevent security attacks.

This completely breaks the python standard library for accessing zips (zipfile.py). The only way around this is to create your own zipfile.py and look for and catch central filename decoding errors to work around this nonsense.

If you are desperate, we can post for you an ePubFixer program (that requires you to have Python 2 installed with Tk widgets (see ActiveState Active Python 2.7 if on Windows, Macs and Linux are all set to go) that will read in the broken epub and write out a fixed epub, that should then work with calibre properly.

The long term solution is for calibre to implement its own zipfile.py code (if it does not do that already) and handle the special case of improper utf-8 flags being set on garbage central directory filenames.

The better solution if this is a B&N epub, is to send the ebook back and request an epub that actually meets the epub specification!

Hope this helps,

KevinH

paul.westland · 11-06-2012, 12:37 PM

If it is a book I bought off of Google Play, can I send it back and ask that they provide a new one?

Is the only way to find out to try?

And thank you for the quick reply. I'm a little peeved at all this, so unknowingly you've made my day much better by being johnny on the spot.

Also, just to tell you a little about the process I'm using to get these, I'm downloading the ACSM files off of GooglePlay and then using Adobe Digital Editions to find the file path to the epub on my computer. Then adding those to the Calibre library.

KevinH · 11-06-2012, 12:55 PM

Hi,

You are right, when given an ACSM file, you give that file to a properly registered Adobe Digital Editions program and it will properly download the epub adding the correct rights.xml file to allow it to be read.

To verify it works, please open the file in ADE and verify you can read the file. If you can, there still might be a problem with the epub but it would be hard to argue that since it can be read in the program it was designed to be read in.

If it is not readable in Adobe Digital Editions, then you should send it back and ask for a working version.

It seems both ADE and B&N ebooks use their own zip access routines that simply walk the local directory extracting files with local filenames and basically ignores the central directory of the zip (which is against all of the security rules but ...).

Hope this helps,

Kevin

Quote:

Originally Posted by paul.westland

If it is a book I bought off of Google Play, can I send it back and ask that they provide a new one?

Is the only way to find out to try?

And thank you for the quick reply. I'm a little peeved at all this, so unknowingly you've made my day much better by being johnny on the spot.

Also, just to tell you a little about the process I'm using to get these, I'm downloading the ACSM files off of GooglePlay and then using Adobe Digital Editions to find the file path to the epub on my computer. Then adding those to the Calibre library.

paul.westland · 11-06-2012, 03:55 PM

I can open it in Adobe, but when I try to open the epub in Calibre it tells me there is an invalid startbyte. Odd?

paul.westland · 11-06-2012, 04:12 PM

If I just try to open the book in calibre, without trying to convert it, I get this error.

calibre, version 0.9.5
ERROR: Could not open ebook: invalid start byte

Traceback (most recent call last):
File "site-packages\calibre\gui2\viewer\main.py", line 40, in run
File "threading.py", line 504, in run
File "site-packages\calibre\ebooks\oeb\iterator\book.py", line 99, in __enter__
File "site-packages\calibre\customize\conversion.py", line 239, in __call__
File "site-packages\calibre\ebooks\conversion\plugins\epub_in put.py", line 153, in convert
File "site-packages\calibre\utils\zipfile.py", line 751, in __init__
File "site-packages\calibre\utils\zipfile.py", line 786, in _GetContents
File "site-packages\calibre\utils\zipfile.py", line 847, in _RealGetContents
File "site-packages\calibre\utils\zipfile.py", line 388, in _decodeFilename
File "encodings\utf_8.py", line 16, in decode
UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 19: invalid start byte

susan_cassidy · 11-06-2012, 04:15 PM

If you're opening it in ADE, it has DRM, which means that it is encrypted. Calibre won't be able to open an encrypted book.

paul.westland · 11-06-2012, 04:18 PM

There are ways around that, I hear, and those ways may be utilized.

kovidgoyal · 11-06-2012, 11:13 PM

Quote:

Originally Posted by KevinH

The long term solution is for calibre to implement its own zipfile.py code (if it does not do that already) and handle the special case of improper utf-8 flags being set on garbage central directory filenames.

calibre does use its own modified version of zipfile.py. You are welcome to submit a patch against it for this issue, if you have epubs that have the issue. Note that to properly solve this, you will not only have to ignore the centrral directory but also correctly decode the local names to unicode. This is because, on windows calibre has to use unicode filenames to avoid encoding issues in the filesystem.

kovidgoyal · 11-07-2012, 12:40 AM

FYI: Using

zip -FF bad.epub --out fixed.epub

should fix the central directory issue.

kovidgoyal · 11-07-2012, 06:53 AM

And http://bazaar.launchpad.net/~kovid/c...revision/13642

oj829 · 11-07-2012, 10:50 AM

Quote:

Originally Posted by paul.westland

If it is a book I bought off of Google Play, can I send it back and ask that they provide a new one?

I just bought a Google Play book this week and I'm having the same exact problem with the downloaded epub - first time I EVER had a problem importing any epub into Calibre, I might add, and this particular book is about my 12th or 13th Google Play epub.

oj829 · 11-11-2012, 02:04 PM

Quote:

Originally Posted by kovidgoyal

FYI: Using

zip -FF bad.epub --out fixed.epub

should fix the central directory issue.

I see that the release notes for 0.9.6 indicate an attempt to incorporate some code to tackle the 'utf8 problem'. Thank you. In the end, though, I had to download 7zip, which was able to examine the epub which was giving me trouble, but more importantly, delete the offending file and repack.

Attached is a partial listing of the troublesome epub TOC, thanks to 7Zip.

Interestingly, this book traveled as is without any tweaking from my ADE2.0 installation to my Adobe-authorized Kobo Touch without so much as a hiccup. A new speedbump from publishers, perhaps??

oj829 · 11-11-2012, 02:11 PM

Quote:

Originally Posted by oj829

7-Zip solution

Incidentally, while 7Zip willingly deleted the file with the corrupt name from the archive, it wouldn't extract it (I thought I could fix the name and put it back).

7Zip bombed out with: 'Unsupported compression method for
'OEBPS\OEBPS\Images\Acit_9780767[etc.]

kovidgoyal · 11-11-2012, 02:16 PM

If 0.9.6 did not work with that file open a bug report and attach the file, I'd be interested in taking a look at it.

11-06-2012, 11:49 AM	#1
paul.westland Junior Member Posts: 5 Karma: 10 Join Date: Nov 2012 Device: Kindle	'utf8' codec can't decode byte 0xb1 in position 18: invalid start byte This is the error I am getting. calibre, version 0.9.5 (win32, isfrozen: True) Conversion Error: Failed: Convert book 1 of 1 (Means of Ascent) Convert book 1 of 1 (Means of Ascent) Resolved conversion options calibre version: 0.9.5 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0.0, 'book_producer': None, 'change_justification': u'original', 'chapter': u"//[((name()='h1' or name()='h2') and re:test(., '\\s((chapter\|book\|section\|part)\\s+)\|((prolog\|pr ologue\|epilogue)(\\s+\|$))', 'i')) or @class = 'chapter']", 'chapter_mark': u'pagebreak', 'comments': None, 'cover': None, 'debug_pipeline': None, 'dehyphenate': True, 'delete_blank_paragraphs': True, 'disable_font_rescaling': False, 'dont_compress': False, 'duplicate_links_in_toc': False, 'embed_font_family': None, 'enable_heuristics': False, 'extra_css': None, 'extract_to': None, 'filter_css': u'', 'fix_indents': True, 'font_size_mapping': None, 'format_scene_breaks': True, 'html_unwrap_factor': 0.4, 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x0370C490>, 'insert_blank_line': False, 'insert_blank_line_size': 0.5, 'insert_metadata': False, 'isbn': None, 'italicize_common_cases': True, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0.0, 'linearize_tables': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'markup_chapter_headings': True, 'max_toc_links': 50, 'minimum_line_height': 120.0, 'mobi_file_type': u'old', 'mobi_ignore_margins': False, 'mobi_keep_original_images': False, 'mobi_toc_at_start': False, 'no_chapters_in_toc': False, 'no_inline_navbars': True, 'no_inline_toc': False, 'output_profile': <calibre.customize.profiles.KindleOutput object at 0x0370C7D0>, 'page_breaks_before': u'/', 'personal_doc': u'[PDOC]', 'prefer_author_sort': False, 'prefer_metadata_cover': False, 'pretty_print': False, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': u'C:\\Users\\m147146\\AppData\\Local\\Temp\\calibr e_0.9.5_tmp_of8z0m\\ltspdi.opf', 'remove_fake_margins': True, 'remove_first_image': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'renumber_headings': True, 'replace_scene_breaks': u'', 'search_replace': '[]', 'series': None, 'series_index': None, 'share_not_sync': False, 'smarten_punctuation': False, 'sr1_replace': None, 'sr1_search': None, 'sr2_replace': None, 'sr2_search': None, 'sr3_replace': None, 'sr3_search': None, 'start_reading_at': None, 'tags': None, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'toc_title': None, 'unsmarten_punctuation': False, 'unwrap_lines': True, 'use_auto_toc': False, 'verbose': 2} InputFormatPlugin: EPUB Input running on C:\Users\m147146\AppData\Local\Temp\calibre_0.9.5_ tmp_of8z0m\ty0jpc.epub Python function terminated unexpectedly 'utf8' codec can't decode byte 0xb1 in position 18: invalid start byte (Error Code: 1) Traceback (most recent call last): File "site.py", line 132, in main File "site.py", line 109, in run_entry_point File "site-packages\calibre\utils\ipc\worker.py", line 186, in main File "site-packages\calibre\gui2\convert\gui_conversion.py", line 31, in gui_convert_override File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert File "site-packages\calibre\ebooks\conversion\plumber.py", line 1000, in run File "site-packages\calibre\customize\conversion.py", line 239, in __call__ File "site-packages\calibre\ebooks\conversion\plugins\epub_in put.py", line 153, in convert File "site-packages\calibre\utils\zipfile.py", line 751, in __init__ File "site-packages\calibre\utils\zipfile.py", line 786, in _GetContents File "site-packages\calibre\utils\zipfile.py", line 847, in _RealGetContents File "site-packages\calibre\utils\zipfile.py", line 388, in _decodeFilename File "encodings\utf_8.py", line 16, in decode UnicodeDecodeError: 'utf8' codec can't decode byte 0xb1 in position 18: invalid start byte Any information I can get on how to fix it would be appreciated.

11-06-2012, 12:37 PM	#3
paul.westland Junior Member Posts: 5 Karma: 10 Join Date: Nov 2012 Device: Kindle	Google? If it is a book I bought off of Google Play, can I send it back and ask that they provide a new one? Is the only way to find out to try? And thank you for the quick reply. I'm a little peeved at all this, so unknowingly you've made my day much better by being johnny on the spot. Also, just to tell you a little about the process I'm using to get these, I'm downloading the ACSM files off of GooglePlay and then using Adobe Digital Editions to find the file path to the epub on my computer. Then adding those to the Calibre library. Last edited by paul.westland; 11-06-2012 at 12:43 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
'utf8' codec can't decode byte 0xd4	anthonyliu	Calibre	0	10-09-2012 12:50 AM
Kindle Collections - utf8 invalid continuation byte	prometheus44	Plugins	3	12-16-2011 07:22 PM
invalid library ... UnicodeDecodeError: 'utf8' codec can't ...	AhShoo5n	Calibre	12	08-23-2011 12:53 PM
Malformed byte sequence: Invalid byte 2 of 3-byte UTF-8 sequence. Check encoding	digireads	ePub	3	04-26-2011 03:07 AM
'utf8' codec can't decode bytes error (HTML to EPUB conversion)	gsz	Calibre	10	10-26-2009 06:29 PM

11-06-2012, 12:19 PM	#2
KevinH Sigil Developer Posts: 7,675 Karma: 5433388 Join Date: Nov 2009 Device: many	Hi, There are many broken epubs out there (especially from B&N)! These epubs do NOT follow the zip or epub specifications. Epubs are supposed to be zip files. One form of breakage is to use garbage chars or full utf-16 unicode in the zip central directory filenames and then set the flag that indicates the names are utf-8 encoded. Another form of breakage is to not have the zip central directory filename match the the local filename and most zip access programs use the broken central directory name over the local name to prevent security attacks. This completely breaks the python standard library for accessing zips (zipfile.py). The only way around this is to create your own zipfile.py and look for and catch central filename decoding errors to work around this nonsense. If you are desperate, we can post for you an ePubFixer program (that requires you to have Python 2 installed with Tk widgets (see ActiveState Active Python 2.7 if on Windows, Macs and Linux are all set to go) that will read in the broken epub and write out a fixed epub, that should then work with calibre properly. The long term solution is for calibre to implement its own zipfile.py code (if it does not do that already) and handle the special case of improper utf-8 flags being set on garbage central directory filenames. The better solution if this is a B&N epub, is to send the ebook back and request an epub that actually meets the epub specification! Hope this helps, KevinH

11-06-2012, 03:55 PM	#5
paul.westland Junior Member Posts: 5 Karma: 10 Join Date: Nov 2012 Device: Kindle	I can open it in Adobe, but when I try to open the epub in Calibre it tells me there is an invalid startbyte. Odd?

11-06-2012, 04:12 PM	#6
paul.westland Junior Member Posts: 5 Karma: 10 Join Date: Nov 2012 Device: Kindle	If I just try to open the book in calibre, without trying to convert it, I get this error. calibre, version 0.9.5 ERROR: Could not open ebook: invalid start byte Traceback (most recent call last): File "site-packages\calibre\gui2\viewer\main.py", line 40, in run File "threading.py", line 504, in run File "site-packages\calibre\ebooks\oeb\iterator\book.py", line 99, in __enter__ File "site-packages\calibre\customize\conversion.py", line 239, in __call__ File "site-packages\calibre\ebooks\conversion\plugins\epub_in put.py", line 153, in convert File "site-packages\calibre\utils\zipfile.py", line 751, in __init__ File "site-packages\calibre\utils\zipfile.py", line 786, in _GetContents File "site-packages\calibre\utils\zipfile.py", line 847, in _RealGetContents File "site-packages\calibre\utils\zipfile.py", line 388, in _decodeFilename File "encodings\utf_8.py", line 16, in decode UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 19: invalid start byte

11-06-2012, 04:15 PM	#7
susan_cassidy Wizard Posts: 2,251 Karma: 3720310 Join Date: Jan 2009 Location: USA Device: Kindle, iPad (not used much for reading)	If you're opening it in ADE, it has DRM, which means that it is encrypted. Calibre won't be able to open an encrypted book.

11-06-2012, 04:18 PM	#8
paul.westland Junior Member Posts: 5 Karma: 10 Join Date: Nov 2012 Device: Kindle	There are ways around that, I hear, and those ways may be utilized.

11-07-2012, 12:40 AM	#10
kovidgoyal creator of calibre Posts: 43,912 Karma: 22669818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	FYI: Using zip -FF bad.epub --out fixed.epub should fix the central directory issue.

11-07-2012, 06:53 AM	#11
kovidgoyal creator of calibre Posts: 43,912 Karma: 22669818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	And http://bazaar.launchpad.net/~kovid/c...revision/13642

11-11-2012, 02:16 PM	#15
kovidgoyal creator of calibre Posts: 43,912 Karma: 22669818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	If 0.9.6 did not work with that file open a bug report and attach the file, I'd be interested in taking a look at it.