slowsmile,
Sigil uses the following code to identify the encoding of an html file when File->Open is run on one:
https://github.com/Sigil-Ebook/Sigil...ngResolver.cpp
The algorithm looks like about like this:
- read file in bytes
- check first 4 bytes for byte order marks to id utf-8, utf-16le, utf-16be, utf-32le, utf-32be
- convert up to 1024 of first bytes to string using utf-8 ignoring errors to create text snippet
- use regular expressions on snippet to look for encoding or charset attributes with or without delimiters to extract encoding name and use that codec to covert it
- if all else fails, quick parse entire file as utf-8 and if no errors use utf-8
- finally just use the local encoding
Hope this helps,
KevinH