MobileRead Forums - View Single Post - Is this as it supposed to be? (Regexp issue?)

ldolse · 03-08-2011, 06:54 PM

(?u) should have worked, just doublechecked the docs - was this what you tried?:

Code:

(?u)(\w+), (\w+)

I'm not sure I would call \S+ the 'best' solution, it's a good solution given this specific problem, \S+? might be a bit better in case you were dealing with strings that had multiple commas. And Mixx is also correct that semantically \S and \w are quite different. The unicode flag is probably the most 'accurate' option.

I can't say that I'm a big fan of the Locale option after thinking about it - based on the Python regex docs that would work, but it would only work for one locale - if you had authors with non-ascii characters from other locales it wouldn't work - a common scenario for translated works.

03-08-2011, 06:54 PM	#12
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	(?u) should have worked, just doublechecked the docs - was this what you tried?: Code: (?u)(\w+), (\w+) I'm not sure I would call \S+ the 'best' solution, it's a good solution given this specific problem, \S+? might be a bit better in case you were dealing with strings that had multiple commas. And Mixx is also correct that semantically \S and \w are quite different. The unicode flag is probably the most 'accurate' option. I can't say that I'm a big fan of the Locale option after thinking about it - based on the Python regex docs that would work, but it would only work for one locale - if you had authors with non-ascii characters from other locales it wouldn't work - a common scenario for translated works. Last edited by ldolse; 03-08-2011 at 06:57 PM.