MobileRead Forums - View Single Post

DiapDealer · 03-17-2015, 10:00 PM

The r''' ''' is probably overkill in this situation, but I've gotten into the habit of using them all the time for regex expressions in python. '[-–—]' or "[-–—]" would achieve the same thing in this particular instance. It's still just a string representation of the regex expression.

Code:

text_str = regex.sub(r'''[-–—]''', ' ', match.group(3))

regex.substitute('everything matching this expression', with 'this', in 'this string')
Find all occurrences of - or – or — and replace them with a space in the string contained in the 3rd matching group. Store the results in text_str.

Code:

text_str = regex.sub(r''' {2,}''', ' ', text_str)

Find all occurrences of two or more consecutive spaces and replace them with a single space in the text_str string. Store the results in text_str.

Code:

return '<{0}{1}>{2}</{0}>'.format(match.group(1), match.group(2), text_str)

String formatting/substitution.

Code:

'Hello {0}'.format('there')

Substitute {0} with 'there'

Code:

'Hello {0} {1} {2}, {0}'.format('there', 'you', 10)

Becomes 'Hello there you 10, there.'

You don't even need to use numbers if you're not going to repeat anything:

Code:

'Hello {} {} {}, {}'.format('there', 'you', 10, 'you')

You could also use string concatenation:

Code:

return match.group(1) + match.group(2) + text_str + match.group(1)

But then you have to worry about making sure everything is represented properly as a string beforehand. Probably not necessary in this case, but again, just a habit I've gotten into to avoid type mismatches (plus I just like it better than the %s %d string substitution method

)

Code:

return '<%s%s>%s</%s>' % (match.group(1), match.group(2), text_str, match.group(1))

In this particular case:

Code:

return '<{0}{1}>{2}</{0}>'.format(match.group(1), match.group(2), text_str)

match.group(1) will be the tag name (h1, h2, h3, etc) and gets plugged into both {0}s.
match.group(2) will be any (optional) attributes (class="foo") and gets plugged into {1}.
text_str is our manipulated content from between the h-tags and gets plugged into {2}

03-17-2015, 10:00 PM	#14
DiapDealer Grand Sorcerer Posts: 28,709 Karma: 205039118 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD	The r''' ''' is probably overkill in this situation, but I've gotten into the habit of using them all the time for regex expressions in python. '[-–—]' or "[-–—]" would achieve the same thing in this particular instance. It's still just a string representation of the regex expression. Code: text_str = regex.sub(r'''[-–—]''', ' ', match.group(3)) regex.substitute('everything matching this expression', with 'this', in 'this string') Find all occurrences of - or – or — and replace them with a space in the string contained in the 3rd matching group. Store the results in text_str. Code: text_str = regex.sub(r''' {2,}''', ' ', text_str) Find all occurrences of two or more consecutive spaces and replace them with a single space in the text_str string. Store the results in text_str. Code: return '<{0}{1}>{2}</{0}>'.format(match.group(1), match.group(2), text_str) String formatting/substitution. Code: 'Hello {0}'.format('there') Substitute {0} with 'there' Code: 'Hello {0} {1} {2}, {0}'.format('there', 'you', 10) Becomes 'Hello there you 10, there.' You don't even need to use numbers if you're not going to repeat anything: Code: 'Hello {} {} {}, {}'.format('there', 'you', 10, 'you') You could also use string concatenation: Code: return match.group(1) + match.group(2) + text_str + match.group(1) But then you have to worry about making sure everything is represented properly as a string beforehand. Probably not necessary in this case, but again, just a habit I've gotten into to avoid type mismatches (plus I just like it better than the %s %d string substitution method ) Code: return '<%s%s>%s</%s>' % (match.group(1), match.group(2), text_str, match.group(1)) In this particular case: Code: return '<{0}{1}>{2}</{0}>'.format(match.group(1), match.group(2), text_str) match.group(1) will be the tag name (h1, h2, h3, etc) and gets plugged into both {0}s. match.group(2) will be any (optional) attributes (class="foo") and gets plugged into {1}. text_str is our manipulated content from between the h-tags and gets plugged into {2} Last edited by DiapDealer; 03-17-2015 at 10:11 PM.