![]() |
#1 |
Color me gone
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
What is format to search for unicode in calibre with regex?
I had gotten some material off the web which has unicode paragraph breaks (U+2029) in it.
I found it very hard to select just his character alone...I ended up having to select the space(s) near it as well. What is correct regex formulation to select such a unicode character generally so I can pull out just the character? I tried several combinations using \x and curly brackets, but nothing seemed to work. As I wrote this, I realized I can get this through special characters, diacritic marks...thanks again for it. But it might be faster for some more obscure ones to just find out the number and be able to regex it out. |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,222
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
\u2029
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Color me gone
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
Sorry, Kovid, I copied and pasted and it wasn't found.
|
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,222
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That is the correct syntax, remember to be in regex mode.
Also, the unicode paragraph separator character is used internally by Qt, so IIRC it is automatically replaced by new lines, which means you may not be able to search for it at all, regardless of syntax. |
![]() |
![]() |
![]() |
#5 |
Color me gone
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
I will test by inserting a different unicode character.
Later: Yup, it has a blind spot there and at 202a, but works fine for the others. Thanks for the info. No need to read out the odd characters unicode number when found in the editor as I suggested earlier, since the readout in unicode is on the insert chart and can be searched for by name. Last edited by mrmikel; 01-15-2014 at 07:25 AM. |
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regex search and replace | dwlamb | Sigil | 6 | 04-12-2013 02:34 PM |
regex search/replace | Sharlene | Sigil | 10 | 01-28-2012 04:14 AM |
RegEx & Unicode | capnm | Library Management | 14 | 12-01-2011 08:23 PM |
Help with regex POSIX class search | bfollowell | Sigil | 7 | 05-21-2011 10:55 AM |
need regex help search and replace | schuster | Calibre | 4 | 01-10-2011 09:00 AM |