|  11-14-2019, 03:18 AM | #1 | 
| Enthusiast  Posts: 26 Karma: 38 Join Date: Nov 2019 Location: Paris, France Device: none | 
				
				Editor plugin : problem with regex and special characters
			 
			
			Inside an editor plugin I'm running regex out of a Json file, like saved searches. All works fine, except for high rank Unicode characters, for example I have : Code: {
      "case_sensitive": false, 
      "dot_all": false, 
      "find": "(‘)", 
      "mode": "regex", 
      "name": "LEFT SINGLE QUOTATION MARK REPLACE", 
      "replace": "'"
    },My Json file is Utf-8 encoded. I extract the pattern with : Code: pattern=unicode(searches["find"]) I'm using the regex module and my compilation flags are : regex.VERSION1 | regex.WORD | regex.FULLCASE | regex.MULTILINE | regex.UNICODE Same problem with all Unicode characters above \u2000. Any idea to get it working ? Thanks | 
|   |   | 
|  11-14-2019, 04:10 AM | #2 | 
| creator of calibre            Posts: 45,598 Karma: 28548962 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			hard to say without looking at your code.
		 | 
|   |   | 
| Advert | |
|  | 
|  11-14-2019, 06:12 AM | #3 | 
| Enthusiast  Posts: 26 Karma: 38 Join Date: Nov 2019 Location: Paris, France Device: none | 
			
			The code is rather long, but I can give some crucial points : I extract the editor text with Code: data=current_container.raw_data(file, decode=True, normalize_to_nfc=True) Code: pattern = regex.compile(unicode(search['find']), flags) match = pattern.search(data) Tell me if you want more. | 
|   |   | 
|  11-14-2019, 06:54 AM | #4 | 
| creator of calibre            Posts: 45,598 Karma: 28548962 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			Looks fine to me. Check if data actually contains the character you are looking for using the in operator. And check what is in search['find']
		 | 
|   |   | 
|  11-14-2019, 07:52 AM | #5 | 
| Enthusiast  Posts: 26 Karma: 38 Join Date: Nov 2019 Location: Paris, France Device: none | 
			
			Damned ! All is fine and works. The only problem was : in my real code I have replace: "\\1" and was only detecting matches if match != replace. Obviously it could'nt be the case. Thank you Kovid for your tips and driving me to the good way. Sorry for the inconvenience. | 
|   |   | 
| Advert | |
|  | 
|  | 
| Thread Tools | Search this Thread | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| [Editor Plugin] - Enabling 'Customize plugin' dialog directly from the Editor | thiago.eec | Development | 7 | 01-09-2019 08:05 PM | 
| RegEx: anchor problem in editor | DrChiper | Editor | 4 | 04-09-2018 09:15 AM | 
| Special characters font problem | dan2the6th | Editor | 6 | 09-12-2015 09:26 PM | 
| Regex to remove the first 4 characters | nynaevelan | Library Management | 3 | 07-19-2014 06:41 PM | 
| Glo Special characters problem | Kljunas | Kobo Reader | 3 | 01-04-2014 11:09 AM |