MobileRead Forums - View Single Post

Azraelo · 08-21-2023, 07:18 AM

Hello,

i've got an epub with multiple weird unicode characters and wanted to use a regex to get rid of it in the epub editor.

Example of text in the epub:
𝒷.𝓬𝑶𝐦

According to my research these character should represent the following unicode characters:
\u1D4B7
\u002E
\u1D4EC
\u1D476
\u1D426

But no matter what I try, the search function never matches those characters.
I've opened a text file within the epub editor, put "\u1D4B7" into the search part and changed the modus to "Regex".
When searching, nothing is found.
If I search for "[\u1D400-\u1D4FF]", then all normal characters are listed as match (a-zA-Z).

What is the logic behind this?

My intention was, to search for something like this:"[\u1D400-\u1D4FF\u002E]{4,20}" and replace it with nothing.
Can you please give me a hint, how to accomplish this?

Regards
Azraelo

08-21-2023, 07:18 AM	#1
Azraelo Junior Member Posts: 5 Karma: 10 Join Date: Jun 2023 Device: Kobo Clara HD	Search for unicode character (ranges) Hello, i've got an epub with multiple weird unicode characters and wanted to use a regex to get rid of it in the epub editor. Example of text in the epub: 𝒷.𝓬𝑶𝐦 According to my research these character should represent the following unicode characters: \u1D4B7 \u002E \u1D4EC \u1D476 \u1D426 But no matter what I try, the search function never matches those characters. I've opened a text file within the epub editor, put "\u1D4B7" into the search part and changed the modus to "Regex". When searching, nothing is found. If I search for "[\u1D400-\u1D4FF]", then all normal characters are listed as match (a-zA-Z). What is the logic behind this? My intention was, to search for something like this:"[\u1D400-\u1D4FF\u002E]{4,20}" and replace it with nothing. Can you please give me a hint, how to accomplish this? Regards Azraelo