MobileRead Forums - View Single Post - Using Regex Find/Change to change by Unicode

KevinH · 05-28-2024, 11:20 PM

I am confused. Are you talking about changing font lookup tables or changing non-utf-8 encoded files into utf-8.

If the latter, if the 8 bit non-utf-8 encoding is properly specified in the xhtml character set meta data, Sigil should recognize it and properly re-encode all xhtml files from the encoding to utf-8.

If you are talking about pasting in latin-1 or some other code page encoded text into Sigil and then trying to fix it in Sigil using Regular expression find and replace, you can do that as well since the pcre2 library used support using hex byte codes \xe1 to whatever unicode value you want.

Just look up any good reference on regular expressions or the online documentation on the pcre2 (library).

For example:

https://www.pcre.org/current/doc/html/pcre2unicode.html

where you can find \x and other escapes.

05-28-2024, 11:20 PM	#2
KevinH Sigil Developer Posts: 9,093 Karma: 6404930 Join Date: Nov 2009 Device: many	I am confused. Are you talking about changing font lookup tables or changing non-utf-8 encoded files into utf-8. If the latter, if the 8 bit non-utf-8 encoding is properly specified in the xhtml character set meta data, Sigil should recognize it and properly re-encode all xhtml files from the encoding to utf-8. If you are talking about pasting in latin-1 or some other code page encoded text into Sigil and then trying to fix it in Sigil using Regular expression find and replace, you can do that as well since the pcre2 library used support using hex byte codes \xe1 to whatever unicode value you want. Just look up any good reference on regular expressions or the online documentation on the pcre2 (library). For example: https://www.pcre.org/current/doc/html/pcre2unicode.html where you can find \x and other escapes.