Iso-8859-1 is a one byte per char text encoding. It is incompatible with utf-8 which is a multibyte encoding although many of the lower 127 chars do map byte for byte to utf-8. Many chars over127 do not.
Any attempt to open a iso-8859-1 (latin-1) encoded file by a text editor will guess utf-8 wrongly and create a one way path to encoding hell. There is no way to recover from it without manual editing.
Which is why in python I would open and read the latin-1 file as binary data (bytes). Then use python "decode" to convert it to full unicode string, which you can the encode back to utf-8 bytes and write the new file back as binary.
Last edited by KevinH; 12-24-2024 at 03:20 PM.
|