MobileRead Forums - View Single Post

DiapDealer · 11-18-2020, 01:27 PM

Python 3.8 right now. I'm seeing it in our Windows bundled Python 3.8, and Python 3.8.6 on Arch. They've not updated to Python 3.9 quite yet (or hadn't as of this morning, anyway).

It sure acts like the "surrogateescape" unicode error-handling strategy on decoding/encoding, but "strict" is supposed to be the default strategy, so I don't get it.

Quote:

strict: this is the default error handler that just raises UnicodeDecodeError for decoding problems and UnicodeEncodeError for encoding problems.

surrogateescape: this is the error handler that Python uses for most OS facing APIs to gracefully cope with encoding problems in the data supplied by the OS. It handles decoding errors by squirreling the data away in a little used part of the Unicode code point space (For those interested in more detail, see PEP 383). When encoding, it translates those hidden away values back into the exact original byte sequence that failed to decode correctly. Just as this is useful for OS APIs, it can make it easier to gracefully handle encoding problems in other contexts.

backslashreplace: this is an encoding error handler that converts code points that can’t be represented in the target encoding to the equivalent Python string numeric escape sequence. It makes it easy to ensure that UnicodeEncodeError will never be thrown, but doesn’t lose much information while doing so losing (since we don’t want encoding problems hiding error output, this error handler is enabled on sys.stderr by default).