View Single Post
Old 12-26-2024, 12:57 PM   #8
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Yes, data must be read in binary and with the right decoder.

The extension only supports utf-8 and doesn't throw an error if a web page uses another encoding,eg. Latin1/iso-8859-1. It's the first time I had the issue in the weeks I've been using it, so it's no biggie. It was the opportunity to understand how both encodings work.

For the curious in the audience, here's how utf-8 works:
1. If a byte is worth 0-127, it remains untouched
2. If it's 128-159, it's considered wrong and replaced with the sequence "0xEFBFBD", ie. "�"
3. If it's 160-255, it's the leading byte of a two-byte combo

For instance, "É" in ISO-8859-1 is 0xC9 or 11001001 in binary. To convert it to utf-8, the first two bits (11) are put in the leading byte (11000011) and the other bits are put in the trailing byte (10001001) → 0xC389.

https://en.wikipedia.org/wiki/UTF-8#Description

Last edited by Shohreh; 12-26-2024 at 12:59 PM.
Shohreh is offline   Reply With Quote