Thread: RegEx & Unicode
View Single Post
Old 11-30-2011, 10:58 PM   #4
capnm
Groupie
capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'
 
Posts: 156
Karma: 10001
Join Date: Feb 2011
Device: sony
Quote:
Originally Posted by Serpentine View Post
Supply a sample(s) and expected result(s), make life easy.
Föô bár
Fb

Though that's pretty irrelevant. I'm not looking for debugging this particular regex, or to start adding tons of individual unicode characters to it.

I'm wondering if calibre's flavor of regex is/can be unicode aware, since I suspect some flavors of regex are, but I've never had occasion to explore the issue before.

Alternatively I thought there might be some calibre template functions that would transliterate a unicode string (though that would have other side effects).

Quote:
Originally Posted by dwanthny View Post
Also where are you using this and why?
At the moment -- in custom columns and plugboards to abbreviate long series names.

But again, it's more of a general question, since at various times, for various reasons, authors, titles, series, etc., get plugged into regexps, and they all have the occasional unicode character which doesn't fall into the standard [a-zA-Z] or \w range.

Last edited by capnm; 11-30-2011 at 11:13 PM.
capnm is offline   Reply With Quote