![]() |
#1 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Jun 2023
Device: Kobo Clara HD
|
![]()
Hello,
i've got an epub with multiple weird unicode characters and wanted to use a regex to get rid of it in the epub editor. Example of text in the epub: 𝒷.𝓬𝑶𝐦 According to my research these character should represent the following unicode characters: \u1D4B7 \u002E \u1D4EC \u1D476 \u1D426 But no matter what I try, the search function never matches those characters. I've opened a text file within the epub editor, put "\u1D4B7" into the search part and changed the modus to "Regex". When searching, nothing is found. If I search for "[\u1D400-\u1D4FF]", then all normal characters are listed as match (a-zA-Z). What is the logic behind this? My intention was, to search for something like this:"[\u1D400-\u1D4FF\u002E]{4,20}" and replace it with nothing. Can you please give me a hint, how to accomplish this? Regards Azraelo |
![]() |
![]() |
![]() |
#2 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 517
Karma: 2268308
Join Date: Nov 2015
Device: none
|
Try \U instead.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Jun 2023
Device: Kobo Clara HD
|
If i try "\U1D4B7", then I get the error message:
calibre, version 6.21.0 FEHLER: Ungültige RegEx: <p>Der reguläre Ausdruck, den Sie eingegeben haben, ist ungültig: <pre>\U1D4B7</pre> mit Fehler: incomplete escape \U1D4B7 at position 2 So this doesn't really help ![]() |
![]() |
![]() |
![]() |
#4 |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 46,190
Karma: 168983734
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Did you try escaping the \? i.e. \\U1D4B7 since most regex implementations use the backslash as a special character. Though I'm not sure if calibre allows use of that format for a Unicode character.
Alternatively, just copy/paste the bscr ( �� ) into the search box. Interesting while I can see the character while entering the message, it does not show when I post the message and if I quote the OP's message it still doesn't show. Last edited by DNSB; 08-21-2023 at 02:43 PM. |
![]() |
![]() |
![]() |
#5 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,025
Karma: 105092227
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,025
Karma: 105092227
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
|
![]() |
![]() |
![]() |
#7 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,025
Karma: 105092227
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
If I just quote it's OK
If I copy and paste it makes entire post bad? 𝒷.𝓬𝑶𝐦 OK in preview |
![]() |
![]() |
![]() |
#8 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,025
Karma: 105092227
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
Baffling!
. . |
![]() |
![]() |
![]() |
#9 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,025
Karma: 105092227
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
Baffling!
𝒷.𝓬𝑶𝐦 𝒷.𝓬𝑶𝐦 This time "Go Advanced" |
![]() |
![]() |
![]() |
#10 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,025
Karma: 105092227
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
So it's messed up unless you "Go Advanced" to see preview.
A site bug. |
![]() |
![]() |
![]() |
#11 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,611
Karma: 9500498
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
It is strange how searching for "Mathematical Script Small B" \u1D4B7 won't work.
But \u00A0 for a No Break Space will work. (Calibre Win10) |
![]() |
![]() |
![]() |
#12 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Jun 2023
Device: Kobo Clara HD
|
Unfortunately searching for "\\U1D4B7" didn't provide any results.
I have created this ebook from a website which randomly adds its domain into the text and relaces the characters with similar looking unicode characters. This is also done randomly, so the combination is never the same. So just copying those characters directly for searching won't work for for more than one occurence. Due to this I want to use the unicode range to catch those strange unicode characters using regex to remove them all at once. |
![]() |
![]() |
![]() |
#13 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,025
Karma: 105092227
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
Spelling Check?
|
![]() |
![]() |
![]() |
#14 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Jun 2023
Device: Kobo Clara HD
|
Nice idea, didn't think of this yet.
Unfortunately, there are hundreds of combinations of the url and the spelling check doesn't seem to provide a batch edit function for similar words. If I wanted to use the editor, I would still have to hundreds of entries manually. |
![]() |
![]() |
![]() |
#15 | |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,724
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
BR |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
pdf to epub regex unicode character match not working | marcio_oliveira | Conversion | 2 | 09-11-2021 03:16 PM |
Aura Supported Unicode ranges | kuvera | Kobo Reader | 3 | 06-12-2015 04:44 PM |
Can't match Unicode character | atordo | Recipes | 2 | 06-15-2012 03:20 PM |
Problem with Unicode Character 'Word Joiner' (U+2060) | psztk | Conversion | 0 | 10-14-2011 01:18 PM |
Glyph Substitution of Unicode character | vdevan | OpenInkpot | 2 | 07-18-2009 05:54 PM |