Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 05-17-2012, 03:13 PM   #1
mr ploppy
Feral Underclass
mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.
 
mr ploppy's Avatar
 
Posts: 3,542
Karma: 26555555
Join Date: Jan 2010
Location: Yorkshire, tha noz
Device: 2nd hand paperback
What is an illegal multibyte sequence ?

Trying to turn this:

http://jakonrath.blogspot.co.uk/2012...m-schreck.html

into an epub with this:

http://web2fb2.net/

but it just says this:

ERROR: 'big5' codec can't decode bytes in position 37378-37379: illegal multibyte sequence
mr ploppy is offline   Reply With Quote
Old 05-17-2012, 03:21 PM   #2
frostschutz
Linux User
frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.
 
frostschutz's Avatar
 
Posts: 787
Karma: 2109877
Join Date: Sep 2010
Device: Kobo H2O, iriver StoryHD
It would be a badly encoded file, except it's trying to use the wrong codec in the first place.

So it's a problem of web2fb2.net

EDIT:

The character in question is in the comment section though ("Yeni �kan"), at least that's where iconv fails when trying to translate from utf8 to latin1 or big5. If you don't want comments in your epub you should probably go for a more selective approach...

Last edited by frostschutz; 05-17-2012 at 03:30 PM.
frostschutz is offline   Reply With Quote
Old 05-17-2012, 04:01 PM   #3
pholy
Booklegger
pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.
 
pholy's Avatar
 
Posts: 1,800
Karma: 7999034
Join Date: Jun 2009
Location: Toronto, Ontario, Canada
Device: BeBook(1 & 2010), PEZ, PRS-505, Kobo BT, PRS-T1, Playbook, Kobo Touch
The web site appears to be using UTF-8 encoding, so there are rules about what values can occur in each position of a multi-byte sequence. This is so that you can always find the start of a multi-byte sequence even if you get plopped down in the middle of a file.
According to Table 3-6, in Section 3.9 of the Unicode Book (available from www.unicode.org as a set of pdf files) the first (and only) byte of a single byte code must start with a zero bit -ie 0xxxxxxx. For a two-byte code, the first byte is 110yyyyy and the second is 10xxxxxx. For a three byte code, the first byte is 1110zzzz, the second is 10yyyyyy, and the third is 10xxxxxx. The 16 bit code value is form by concatenating the zzzz as high order, then the yy's and finally the xx's with leading zeroes of course. The chart in the book shows it better
You might need a hex editor to find that byte pair, because most text editors deal in lines; but a good unicode editor should point out the problem when it opens the file. Then you can fix it howeveer you choose.
I'm not sure why it is using a 'big5' codec; that's usually used for chinese texts.

edit:
Dang! Beaten to the punch. So much for the long-winded (and still incomplete) explanation.

Last edited by pholy; 05-17-2012 at 04:03 PM.
pholy is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
SEQUENCE 77 Darbarian Self-Promotions by Authors and Publishers 9 02-10-2013 06:03 PM
Malformed byte sequence: Invalid byte 2 of 3-byte UTF-8 sequence. Check encoding digireads ePub 3 04-26-2011 04:07 AM
Series sequence? Toxaris Sony Reader 9 04-09-2010 08:36 PM
Problems with multibyte data flags pdurrant Kindle Formats 5 03-13-2010 11:39 AM
PRS-300 Author sequence denmarks Sony Reader 1 10-06-2009 12:49 AM


All times are GMT -4. The time now is 09:45 AM.


MobileRead.com is a privately owned, operated and funded community.