Thread: RegEx & Unicode
View Single Post
Old 12-01-2011, 11:57 AM   #11
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
You can most likely use something like :
Code:
(?i)(?:^|\s+)(\d+\.?\d*?|[\D])
To grab all of the interesting first characters/numbers

Code:
string = r'Föô bár  šjohka'
>>> regex.findall(string)
[u'F', u'b', u'\xe1']
I'm sure you can work it into a replacement without too much of a problem.
Serpentine is offline   Reply With Quote