Quote:
Originally Posted by NiLuJe
@sherman: Yeah, for some reason, I was worried it'd flag stuff in-between multi-byte characters, but no, after a good night's sleep and some testing, things appear to work well (at least with Latin scripts, so, 2 or 3 bytes) .
This just made me mad at Kobo again for their broken libc because I have to swap to a custom busybox shell to actually be able to input utf8 in my terminal, but, oh, well .
|
That's the nice property that UTF-8 has. Any byte outside the ASCII range starts with
1xxxxxxx, so a valid ASCII character will never be found within a multi-byte sequence.