|
|
Thread Tools | Search this Thread |
06-09-2010, 12:14 PM | #1 |
Junior Member
Posts: 2
Karma: 10
Join Date: Jun 2010
Device: none
|
Malformed byte sequence: Invalid byte 2 of 3-byte UTF-8 sequence. Check encoding
I am having trouble getting an ePUB file to validate. I get the following error message:
Malformed byte sequence: Invalid byte 2 of 3-byte UTF-8 sequence. Check encoding. Can anyone tell me what I should do to my source file to correct this? Thanks, Kevin |
06-09-2010, 01:39 PM | #2 |
Wizard
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
|
Somewhere in your production chain you have an editor that's not handling UTF-8 properly and is inserting garbage that's being interpreted as a UTF-16 surrogate. You need to fix this or you'll run into encoding errors again in the future.
To fix the current problem, open the affected file in Notepad++ and use that to convert the encoding (in the Format menu). You may need to track down and change the character that's been mangled by the misbehaving editor. |
Advert | |
|
04-25-2011, 08:02 PM | #3 |
Junior Member
Posts: 1
Karma: 10
Join Date: Jun 2010
Device: none
|
what is missing from the epubcheck error message
one of the problems with this version of epubcheck (all versions through 1.2 as of april 25, 2011) is that the error message fails to give enough information to help you (or me) know what to do with the file.
information that would be helpful would be the exact line number and character number in that line (i.e., "row and column") where the problem exists. without that information, and with a huge file, it's much more of a guessing game. compare the usefulness of this error message i get from epubcheck: Code:
$ epubcheck fp.epub ERROR: fp.epub/Ops/037.html: Malformed byte sequence: Invalid byte 1 of 1-byte UTF-8 sequence. Check encoding $ Code:
$ isutf8 037.html 037.html: line 19, char 1, byte offset 1921: invalid UTF-8 code $ however, in my tests, i find that even isutf8 is not as helpful as it could be, since the problem, while it is on line 19, is not at character 1 on that line in my sample file. it is much further out on line 19 (that's a long line in my file). the particular software that worked for me was emacs, because when i opened the file and then tried to save it, it gave me this message: Code:
These default coding systems were tried to encode text
in the buffer `037.html':
(utf-8-dos (63433 . 4194300))
However, each of them encountered characters it couldn't encode:
utf-8-dos cannot encode these: \374
Click on a character (or switch to this window by `C-x o'
and select the characters by RET) to jump to the place it appears,
where `C-u C-x =' will give information about it.
Select one of the safe coding systems listed below,
or cancel the writing with C-g and edit the buffer
to remove or modify the problematic characters,
or specify any other coding system (and risk losing
the problematic characters).
raw-text emacs-mule no-conversion
take-away for all of us programmers: when we create error messages, it is so much more helpful and time-saving for end-users if we take the time to:
e.g., "Please open file 037.html with a UTF-8 capable text editor, or hex editor, etc., and navigate to line 19, character 171, and see what is under the cursor at that point, and replace it with a character which is encoded correctly in UTF-8." yes, this takes one person (us!) some time. but it saves humanity many times that. Last edited by jastern; 04-25-2011 at 08:18 PM. |
04-26-2011, 03:07 AM | #4 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
Please, through epubcheck out of the windows. The messages are too cryptic and unusable.
Usually Flightcrew gives better results which are usually better understandable. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Series sequence? | Toxaris | Sony Reader | 9 | 04-09-2010 07:36 PM |
Series sequence? | Toxaris | Calibre | 5 | 04-09-2010 07:04 PM |
folder sequence problem | sparrow_knight | Calibre | 5 | 12-14-2009 08:05 PM |
PRS-300 Author sequence | denmarks | Sony Reader | 1 | 10-05-2009 11:49 PM |
Asian 2 Byte Language Support? | masa | Sony Reader | 8 | 11-16-2006 08:38 PM |