![]() |
What is format to search for unicode in calibre with regex?
I had gotten some material off the web which has unicode paragraph breaks (U+2029) in it.
I found it very hard to select just his character alone...I ended up having to select the space(s) near it as well. What is correct regex formulation to select such a unicode character generally so I can pull out just the character? I tried several combinations using \x and curly brackets, but nothing seemed to work. As I wrote this, I realized I can get this through special characters, diacritic marks...thanks again for it. But it might be faster for some more obscure ones to just find out the number and be able to regex it out. |
\u2029
|
Sorry, Kovid, I copied and pasted and it wasn't found.
|
That is the correct syntax, remember to be in regex mode.
Also, the unicode paragraph separator character is used internally by Qt, so IIRC it is automatically replaced by new lines, which means you may not be able to search for it at all, regardless of syntax. |
I will test by inserting a different unicode character.
Later: Yup, it has a blind spot there and at 202a, but works fine for the others. Thanks for the info. No need to read out the odd characters unicode number when found in the editor as I suggested earlier, since the readout in unicode is on the insert chart and can be searched for by name. |
| All times are GMT -4. The time now is 10:43 PM. |
Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.