View Single Post
Old 05-29-2024, 09:28 AM   #9
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,884
Karma: 6120478
Join Date: Nov 2009
Device: many
I still do not understand how a non-utf-8 byte sequence E1 got into the file in the first place. Either the original xhtml was cp1251 or latin-1 encoded and did not indicate that when being read in so that it could properly be converted to utf-8, or a copy from a cp-1251 or latin-1 source was pasted in without proper conversion.

Either way, the find replace step should not be needed unless earlier steps broke someplace.

The actual font used has nothing really to do with reading in and properly encoding a text file. The problem typically comes from not properly specifying the original encoding of the file inside it near the top. Without that, Sigil's auto detection code can sometimes incorrectly guess the input encoding. Detecting the difference between latin-x/cp-125x and utf-8 is actually quite hard from small snippets of text.
KevinH is online now   Reply With Quote