Thread: 0.5.0 Released
View Single Post
Old 01-27-2012, 08:02 PM   #85
Ahmad Samir
Zealot
Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!
 
Posts: 114
Karma: 5246
Join Date: Jul 2010
Device: none
IIUC, what (?s) does is make "." match anything including newlines.

So, this will work:
<p class="xxx">&nbsp;</p>\s+<p class"xxx">

but this won't work without (?s):
<p class="xxx">&nbsp;</p>.*<p class"xxx">

Quoting from http://perldoc.perl.org/5.8.9/perlrecharclass.html:

Quote:
White space
\s matches any single character that is consider white space. In the ASCII range, \s matches the horizontal tab (\t ), the new line (\n ), the form feed (\f ), the carriage return (\r ), and the space (the vertical tab, \cK is not matched by \s). The exact set of characters matched by \s depends on whether the source string is in UTF-8 format. If it is, \s matches what is considered white space in the Unicode database. Otherwise, if there is a locale in effect, \s matches whatever is considered white space by the current locale. Without a locale, \s matches the five characters mentioned in the beginning of this paragraph. Perhaps the most notable difference is that \s matches a non-breaking space only if the non-breaking space is in a UTF-8 encoded string.
Ahmad Samir is offline   Reply With Quote