Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 11-14-2019, 04:18 AM   #1
EbookMakers
Junior Member
EbookMakers began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2019
Device: none
Editor plugin : problem with regex and special characters

Inside an editor plugin I'm running regex out of a Json file, like saved searches.
All works fine, except for high rank Unicode characters, for example I have :

Code:
{
      "case_sensitive": false, 
      "dot_all": false, 
      "find": "(‘)", 
      "mode": "regex", 
      "name": "LEFT SINGLE QUOTATION MARK REPLACE", 
      "replace": "'"
    },
Problem : this character is never found, even if I replace it with \u2018.
My Json file is Utf-8 encoded. I extract the pattern with :
Code:
pattern=unicode(searches["find"])
Even tried ur'pattern', nothing works.
I'm using the regex module and my compilation flags are : regex.VERSION1 | regex.WORD | regex.FULLCASE | regex.MULTILINE | regex.UNICODE

Same problem with all Unicode characters above \u2000.

Any idea to get it working ?
Thanks
EbookMakers is offline   Reply With Quote
Old 11-14-2019, 05:10 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 36,107
Karma: 15014853
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
hard to say without looking at your code.
kovidgoyal is offline   Reply With Quote
Advert
Old 11-14-2019, 07:12 AM   #3
EbookMakers
Junior Member
EbookMakers began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2019
Device: none
The code is rather long, but I can give some crucial points :
I extract the editor text with
Code:
data=current_container.raw_data(file, decode=True, normalize_to_nfc=True)
Then apply the search on it :
Code:
pattern = regex.compile(unicode(search['find']), flags)
match = pattern.search(data)
I have no error, except if I replace ‘ or \u2018 with \xE2\x80\x98
Tell me if you want more.
EbookMakers is offline   Reply With Quote
Old 11-14-2019, 07:54 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 36,107
Karma: 15014853
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Looks fine to me. Check if data actually contains the character you are looking for using the in operator. And check what is in search['find']
kovidgoyal is offline   Reply With Quote
Old 11-14-2019, 08:52 AM   #5
EbookMakers
Junior Member
EbookMakers began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2019
Device: none
Damned ! All is fine and works.
The only problem was : in my real code I have replace: "\\1" and was only detecting matches if match != replace.

Obviously it could'nt be the case.

Thank you Kovid for your tips and driving me to the good way.
Sorry for the inconvenience.
EbookMakers is offline   Reply With Quote
Advert
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Editor Plugin] - Enabling 'Customize plugin' dialog directly from the Editor thiago.eec Development 7 01-09-2019 09:05 PM
RegEx: anchor problem in editor DrChiper Editor 4 04-09-2018 10:15 AM
Special characters font problem dan2the6th Editor 6 09-12-2015 10:26 PM
Regex to remove the first 4 characters nynaevelan Library Management 3 07-19-2014 07:41 PM
Glo Special characters problem Kljunas Kobo Reader 3 01-04-2014 12:09 PM


All times are GMT -4. The time now is 06:15 PM.


MobileRead.com is a privately owned, operated and funded community.