View Single Post
Old 12-24-2024, 02:03 PM   #7
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,803
Karma: 6000000
Join Date: Nov 2009
Device: many
Iso-8859-1 is a one byte per char text encoding. It is incompatible with utf-8 which is a multibyte encoding although many of the lower 127 chars do map byte for byte to utf-8. Many chars over127 do not.

Any attempt to open a iso-8859-1 (latin-1) encoded file by a text editor will guess utf-8 wrongly and create a one way path to encoding hell. There is no way to recover from it without manual editing.

Which is why in python I would open and read the latin-1 file as binary data (bytes). Then use python "decode" to convert it to full unicode string, which you can the encode back to utf-8 bytes and write the new file back as binary.

Last edited by KevinH; 12-24-2024 at 03:20 PM.
KevinH is offline   Reply With Quote