MobileRead Forums - View Single Post - What is format to search for unicode in calibre with regex?

mrmikel · 01-14-2014, 07:16 AM

I had gotten some material off the web which has unicode paragraph breaks (U+2029) in it.

I found it very hard to select just his character alone...I ended up having to select the space(s) near it as well.

What is correct regex formulation to select such a unicode character generally so I can pull out just the character?

I tried several combinations using \x and curly brackets, but nothing seemed to work.

As I wrote this, I realized I can get this through special characters, diacritic marks...thanks again for it.

But it might be faster for some more obscure ones to just find out the number and be able to regex it out.

01-14-2014, 07:16 AM	#1
mrmikel Color me gone Posts: 2,089 Karma: 1445295 Join Date: Apr 2008 Location: Central Oregon Coast Device: PRS-300	What is format to search for unicode in calibre with regex? I had gotten some material off the web which has unicode paragraph breaks (U+2029) in it. I found it very hard to select just his character alone...I ended up having to select the space(s) near it as well. What is correct regex formulation to select such a unicode character generally so I can pull out just the character? I tried several combinations using \x and curly brackets, but nothing seemed to work. As I wrote this, I realized I can get this through special characters, diacritic marks...thanks again for it. But it might be faster for some more obscure ones to just find out the number and be able to regex it out.