Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 11-14-2019, 03:18 AM   #1
EbookMakers
Enthusiast
EbookMakers began at the beginning.
 
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
Editor plugin : problem with regex and special characters

Inside an editor plugin I'm running regex out of a Json file, like saved searches.
All works fine, except for high rank Unicode characters, for example I have :

Code:
{
      "case_sensitive": false, 
      "dot_all": false, 
      "find": "(‘)", 
      "mode": "regex", 
      "name": "LEFT SINGLE QUOTATION MARK REPLACE", 
      "replace": "'"
    },
Problem : this character is never found, even if I replace it with \u2018.
My Json file is Utf-8 encoded. I extract the pattern with :
Code:
pattern=unicode(searches["find"])
Even tried ur'pattern', nothing works.
I'm using the regex module and my compilation flags are : regex.VERSION1 | regex.WORD | regex.FULLCASE | regex.MULTILINE | regex.UNICODE

Same problem with all Unicode characters above \u2000.

Any idea to get it working ?
Thanks
EbookMakers is offline   Reply With Quote
Old 11-14-2019, 04:10 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
hard to say without looking at your code.
kovidgoyal is offline   Reply With Quote
Advert
Old 11-14-2019, 06:12 AM   #3
EbookMakers
Enthusiast
EbookMakers began at the beginning.
 
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
The code is rather long, but I can give some crucial points :
I extract the editor text with
Code:
data=current_container.raw_data(file, decode=True, normalize_to_nfc=True)
Then apply the search on it :
Code:
pattern = regex.compile(unicode(search['find']), flags)
match = pattern.search(data)
I have no error, except if I replace ‘ or \u2018 with \xE2\x80\x98
Tell me if you want more.
EbookMakers is offline   Reply With Quote
Old 11-14-2019, 06:54 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Looks fine to me. Check if data actually contains the character you are looking for using the in operator. And check what is in search['find']
kovidgoyal is offline   Reply With Quote
Old 11-14-2019, 07:52 AM   #5
EbookMakers
Enthusiast
EbookMakers began at the beginning.
 
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
Damned ! All is fine and works.
The only problem was : in my real code I have replace: "\\1" and was only detecting matches if match != replace.

Obviously it could'nt be the case.

Thank you Kovid for your tips and driving me to the good way.
Sorry for the inconvenience.
EbookMakers is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Editor Plugin] - Enabling 'Customize plugin' dialog directly from the Editor thiago.eec Development 7 01-09-2019 08:05 PM
RegEx: anchor problem in editor DrChiper Editor 4 04-09-2018 09:15 AM
Special characters font problem dan2the6th Editor 6 09-12-2015 09:26 PM
Regex to remove the first 4 characters nynaevelan Library Management 3 07-19-2014 06:41 PM
Glo Special characters problem Kljunas Kobo Reader 3 01-04-2014 11:09 AM


All times are GMT -4. The time now is 08:19 AM.


MobileRead.com is a privately owned, operated and funded community.