11-16-2017, 10:16 AM | #1 |
Perfectionist
Posts: 62
Karma: 12802
Join Date: Apr 2014
Device: none
|
Regex to find multiple spaces between HTML tags
Interesting question from another forum.
Is there a Python regex to match multiple spaces between the opening and closing HTML tags, disregarding the leading and trailing whitespaces and the fact that browsers collapse whitespace on render? For instance, in: Code:
<p><i>blah</i> §111 </p> |
11-16-2017, 11:15 AM | #2 |
Book E d i t o r
Posts: 432
Karma: 288184
Join Date: May 2015
Device: Laptop
|
My simple method would be to highlight that space, copy it to the clipboard, and paste it into the Find box. Then replace all with nothing in the Replace box.
|
11-16-2017, 02:43 PM | #3 | |
Well trained by Cats
Posts: 29,659
Karma: 54369090
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
Code:
\s{3,}</p> ;3 or more before a P |
|
11-16-2017, 06:38 PM | #4 | |
Perfectionist
Posts: 62
Karma: 12802
Join Date: Apr 2014
Device: none
|
Quote:
And what if it was another pair of tags? I.e. is there a generic regex which would find multiple spaces in any position between any opening and closing tags? |
|
11-16-2017, 08:48 PM | #5 | |
Well trained by Cats
Posts: 29,659
Karma: 54369090
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
Many devices do not render repetitive normal spaces |
|
11-17-2017, 07:15 AM | #6 | |
Connoisseur
Posts: 67
Karma: 10
Join Date: Apr 2011
Device: Kindle 3, Samsung Tab 4
|
Quote:
Find: \s\s+ Replace: [space] Then use "pretty" to clean up. |
|
11-17-2017, 12:45 PM | #7 | |
Age improves with wine.
Posts: 558
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
|
Quote:
Replace with: \1 \3 This uses a real space rather than \s, as \s also matches newlines. Use [ \t] if you want tabs as well. The problem is that this will match the first tag containing multiple spaces, which will be <body>...</body>; better to do something like this, listing the tags you want to match: Find: (<(p|div|blockquote|i)( [^>]*)?>.*?) {2,}(.*?</\2>) Replace with: \1 \4 The use of ( [^>]*)? instead of [^>]* is to guard against matching e.g. <img> with </i>. |
|
11-17-2017, 07:17 PM | #8 |
Perfectionist
Posts: 62
Karma: 12802
Join Date: Apr 2014
Device: none
|
@ Phssthpok
Excellent, many thanks! <nitpicking> Your solution matches the whole line, everything between the opening and closing tags containing multiple spaces (which in itself is very useful). Could you perhaps come up with a regex that matches such spaces only?</nitpicking> |
11-18-2017, 06:22 AM | #9 | |
Age improves with wine.
Posts: 558
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
|
Quote:
But I assume that it's the spaces you want to replace --and what I gave you will at least do that. The replacement string replaces them with a single space, which I assumed was what you wanted. |
|
11-18-2017, 07:01 AM | #10 | |
Age improves with wine.
Posts: 558
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
|
Quote:
Then find ∀, and do whatever you want at each match. |
|
11-18-2017, 07:11 AM | #11 |
Perfectionist
Posts: 62
Karma: 12802
Join Date: Apr 2014
Device: none
|
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Q: Regex Find and Replace delete surrounding tags | hidden.platypus | Editor | 14 | 06-16-2015 11:16 PM |
regex - issue with spaces? | cybmole | Editor | 43 | 12-31-2013 12:49 PM |
Regex Find and Replace - Spaces | essayhead | Sigil | 2 | 08-10-2012 07:41 PM |
Help to compose a regex to find strings, enclosed in comments tags | Vadim777 | Conversion | 5 | 04-17-2012 12:49 PM |
Sigil adds spaces between HTML tags, creating strange characters | chezjim | Sigil | 10 | 06-10-2011 04:00 AM |