11-14-2019, 03:18 AM | #1 |
Enthusiast
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
|
Editor plugin : problem with regex and special characters
Inside an editor plugin I'm running regex out of a Json file, like saved searches.
All works fine, except for high rank Unicode characters, for example I have : Code:
{ "case_sensitive": false, "dot_all": false, "find": "(‘)", "mode": "regex", "name": "LEFT SINGLE QUOTATION MARK REPLACE", "replace": "'" }, My Json file is Utf-8 encoded. I extract the pattern with : Code:
pattern=unicode(searches["find"]) I'm using the regex module and my compilation flags are : regex.VERSION1 | regex.WORD | regex.FULLCASE | regex.MULTILINE | regex.UNICODE Same problem with all Unicode characters above \u2000. Any idea to get it working ? Thanks |
11-14-2019, 04:10 AM | #2 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
hard to say without looking at your code.
|
Advert | |
|
11-14-2019, 06:12 AM | #3 |
Enthusiast
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
|
The code is rather long, but I can give some crucial points :
I extract the editor text with Code:
data=current_container.raw_data(file, decode=True, normalize_to_nfc=True) Code:
pattern = regex.compile(unicode(search['find']), flags) match = pattern.search(data) Tell me if you want more. |
11-14-2019, 06:54 AM | #4 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Looks fine to me. Check if data actually contains the character you are looking for using the in operator. And check what is in search['find']
|
11-14-2019, 07:52 AM | #5 |
Enthusiast
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
|
Damned ! All is fine and works.
The only problem was : in my real code I have replace: "\\1" and was only detecting matches if match != replace. Obviously it could'nt be the case. Thank you Kovid for your tips and driving me to the good way. Sorry for the inconvenience. |
Advert | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[Editor Plugin] - Enabling 'Customize plugin' dialog directly from the Editor | thiago.eec | Development | 7 | 01-09-2019 08:05 PM |
RegEx: anchor problem in editor | DrChiper | Editor | 4 | 04-09-2018 09:15 AM |
Special characters font problem | dan2the6th | Editor | 6 | 09-12-2015 09:26 PM |
Regex to remove the first 4 characters | nynaevelan | Library Management | 3 | 07-19-2014 06:41 PM |
Glo Special characters problem | Kljunas | Kobo Reader | 3 | 01-04-2014 11:09 AM |