Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 06-09-2010, 12:14 PM   #1
digireads
Junior Member
digireads began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jun 2010
Device: none
Malformed byte sequence: Invalid byte 2 of 3-byte UTF-8 sequence. Check encoding

I am having trouble getting an ePUB file to validate. I get the following error message:

Malformed byte sequence: Invalid byte 2 of 3-byte UTF-8 sequence. Check encoding.

Can anyone tell me what I should do to my source file to correct this?

Thanks,

Kevin
digireads is offline   Reply With Quote
Old 06-09-2010, 01:39 PM   #2
charleski
Wizard
charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.
 
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
Somewhere in your production chain you have an editor that's not handling UTF-8 properly and is inserting garbage that's being interpreted as a UTF-16 surrogate. You need to fix this or you'll run into encoding errors again in the future.

To fix the current problem, open the affected file in Notepad++ and use that to convert the encoding (in the Format menu). You may need to track down and change the character that's been mangled by the misbehaving editor.
charleski is offline   Reply With Quote
Advert
Old 04-25-2011, 08:02 PM   #3
jastern
Junior Member
jastern began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jun 2010
Device: none
what is missing from the epubcheck error message

one of the problems with this version of epubcheck (all versions through 1.2 as of april 25, 2011) is that the error message fails to give enough information to help you (or me) know what to do with the file.

information that would be helpful would be the exact line number and character number in that line (i.e., "row and column") where the problem exists.

without that information, and with a huge file, it's much more of a guessing game.

compare the usefulness of this error message i get from epubcheck:

Code:
$ epubcheck fp.epub
ERROR: fp.epub/Ops/037.html: Malformed byte sequence: Invalid byte 1 of 1-byte UTF-8 sequence. Check encoding
$
with the error message i get from the command-line utility, "isutf8" (available, for instance, in the "moreutils" package on Ubuntu Linux):

Code:
$ isutf8 037.html
037.html: line 19, char 1, byte offset 1921: invalid UTF-8 code
$
doesn't that seem much more helpful to know exactly which line and character on that line, is giving the problem? i'll bet if you had that, you wouldn't have had to even post the question.

however, in my tests, i find that even isutf8 is not as helpful as it could be, since the problem, while it is on line 19, is not at character 1 on that line in my sample file. it is much further out on line 19 (that's a long line in my file).

the particular software that worked for me was emacs, because when i opened the file and then tried to save it, it gave me this message:

Code:
These default coding systems were tried to encode text
in the buffer `037.html':
  (utf-8-dos (63433 . 4194300))
However, each of them encountered characters it couldn't encode:
  utf-8-dos cannot encode these: \374

Click on a character (or switch to this window by `C-x o'
and select the characters by RET) to jump to the place it appears,
where `C-u C-x =' will give information about it.

Select one of the safe coding systems listed below,
or cancel the writing with C-g and edit the buffer
   to remove or modify the problematic characters,
or specify any other coding system (and risk losing
   the problematic characters).

  raw-text emacs-mule no-conversion
and when i clicked on the \374, it took me to precisely the place in the buffer where the exact problem was. i could see it needed to be replaced with a "ü".

take-away for all of us programmers: when we create error messages, it is so much more helpful and time-saving for end-users if we take the time to:
  1. tell the end-user exactly where the problem is and
  2. what to do about it, if at all possible. and
  3. we need to make sure that information is accurate.

e.g., "Please open file 037.html with a UTF-8 capable text editor, or hex editor, etc., and navigate to line 19, character 171, and see what is under the cursor at that point, and replace it with a character which is encoded correctly in UTF-8."

yes, this takes one person (us!) some time. but it saves humanity many times that.

Last edited by jastern; 04-25-2011 at 08:18 PM.
jastern is offline   Reply With Quote
Old 04-26-2011, 03:07 AM   #4
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Please, through epubcheck out of the windows. The messages are too cryptic and unusable.

Usually Flightcrew gives better results which are usually better understandable.
Toxaris is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Series sequence? Toxaris Sony Reader 9 04-09-2010 07:36 PM
Series sequence? Toxaris Calibre 5 04-09-2010 07:04 PM
folder sequence problem sparrow_knight Calibre 5 12-14-2009 08:05 PM
PRS-300 Author sequence denmarks Sony Reader 1 10-05-2009 11:49 PM
Asian 2 Byte Language Support? masa Sony Reader 8 11-16-2006 08:38 PM


All times are GMT -4. The time now is 08:43 AM.


MobileRead.com is a privately owned, operated and funded community.