View Single Post
Old 10-10-2024, 02:57 PM   #1
Mister L
Groupie
Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Posts: 179
Karma: 91148
Join Date: Jun 2010
Device: Sony 350
2.3.1: Accented character encoding automatically updated?

Edit: Oops, I checked the update log for 2.3.0 and I think this is probably the answer to my question:
"we are now using NFC unicode normalization form for all content, links, urls, and file paths."
Sorry, I didn't make the connection when I read it the first time. Please delete this thread.


Thanks for the latest version, it looks great.

Spoiler:
This is just a request for confirmation.

TLDR:
Questions:

1. does Sigil automatically update all accented characters (çéàùïöÔôÂâ etc.) to proper unicode characters now regardless of source encoding? On open, as well as on paste?

2. If Sigil catches them now on its own I don't have to worry about it, but if some of them can still sneak through, is it possible to restore my backup .ini (with the old encoding) of my saved searches? (how?)

I usually do those searches routinely on all files just in case and don't bother to check the reports, I just want to know if I should start checking again for anything in future.




Context:

Some old Mac / InDesign files add accents as a separate character:
è = e + `

In Notepad++ using the arrow keys to move the cursor it takes 2 clicks to pass the double characters and they used to be listed in the Characters Used report in Sigil. They can (occasionally) display wrong (accent next to the letter instead of above it) so I have saved searches to replace them all with correct unicode characters.

I've just noticed that in Sigil 2.3.1 the ini file has been modified:

Code:
111\Name=bases/suppl\xe9mentaires/accents en 2 cars/E aigu 2c
111\Find=E\x301
111\Replace=\xc9
Instead of this:

Code:
115\Name=bases/supplémentaires/accents en 2 cars/E aigu 2c
115\Find=É
115\Replace=É
(Find is 2 characters, pretty sure that is not preserved here)

Now those searches return ALL accented characters (including correct unicode ones).

I pasted some of the double characters from Notepad++ into Sigil and only the correct unicode character was listed in the report, but if I copy just the floating accent it will display.

Code view
Name:  2024-10-10_20h02_04.jpg
Views: 301
Size:  1.6 KB
Report
Click image for larger version

Name:	2024-10-10_20h03_43.jpg
Views:	250
Size:	6.4 KB
ID:	211315

So it seems like those characters are now automatigally updated when they are pasted into code view, but I want to be sure so I don't accidentally miss any.

Thanks for any information you can give.

Last edited by Mister L; 10-10-2024 at 04:33 PM.
Mister L is offline   Reply With Quote