![]() |
Regex to find multiple spaces between HTML tags
Interesting question from another forum.
Is there a Python regex to match multiple spaces between the opening and closing HTML tags, disregarding the leading and trailing whitespaces and the fact that browsers collapse whitespace on render? For instance, in: Code:
<p><i>blah</i> §111 </p> |
My simple method would be to highlight that space, copy it to the clipboard, and paste it into the Find box. Then replace all with nothing in the Replace box.
|
Quote:
Code:
\s{3,}</p> ;3 or more before a P |
Quote:
And what if it was another pair of tags? I.e. is there a generic regex which would find multiple spaces in any position between any opening and closing tags? |
Quote:
Many devices do not render repetitive normal spaces |
Quote:
Find: \s\s+ Replace: [space] Then use "pretty" to clean up. |
Quote:
Replace with: \1 \3 This uses a real space rather than \s, as \s also matches newlines. Use [ \t] if you want tabs as well. The problem is that this will match the first tag containing multiple spaces, which will be <body>...</body>; better to do something like this, listing the tags you want to match: Find: (<(p|div|blockquote|i)( [^>]*)?>.*?) {2,}(.*?</\2>) Replace with: \1 \4 The use of ( [^>]*)? instead of [^>]* is to guard against matching e.g. <img> with </i>. |
@ Phssthpok
Excellent, many thanks! <nitpicking> Your solution matches the whole line, everything between the opening and closing tags containing multiple spaces (which in itself is very useful). Could you perhaps come up with a regex that matches such spaces only?</nitpicking> |
Quote:
But I assume that it's the spaces you want to replace --and what I gave you will at least do that. The replacement string replaces them with a single space, which I assumed was what you wanted. |
Quote:
Then find ∀, and do whatever you want at each match. |
Quote:
Perfect :thumbsup: Thanks a bunch! |
| All times are GMT -4. The time now is 10:50 PM. |
Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.