MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Editor (https://www.mobileread.com/forums/forumdisplay.php?f=262)
-   -   Regex to find multiple spaces between HTML tags (https://www.mobileread.com/forums/showthread.php?t=292204)

mikapanja 11-16-2017 11:16 AM

Regex to find multiple spaces between HTML tags
 
Interesting question from another forum.

Is there a Python regex to match multiple spaces between the opening and closing HTML tags,
disregarding the leading and trailing whitespaces and the fact that browsers collapse whitespace on render?

For instance, in:
Code:

<p><i>blah</i> §111  </p>
how does one capture the three spaces after 111?

deback 11-16-2017 12:15 PM

My simple method would be to highlight that space, copy it to the clipboard, and paste it into the Find box. Then replace all with nothing in the Replace box.

theducks 11-16-2017 03:43 PM

Quote:

Originally Posted by mikapanja (Post 3612714)
Interesting question from another forum.

Is there a Python regex to match multiple spaces between the opening and closing HTML tags,
disregarding the leading and trailing whitespaces and the fact that browsers collapse whitespace on render?

For instance, in:
Code:

<p><i>blah</i> §111  </p>
how does one capture the three spaces after 111?

Code:

\s{3,}</p>  ;3 or more before a P
\s also matches newlines, so the requirement for the closing p tag

mikapanja 11-16-2017 07:38 PM

Quote:

Originally Posted by theducks (Post 3612880)
Code:

\s{3,}</p>  ;3 or more before a P
\s also matches newlines, so the requirement for the closing p tag

Yes, but what if there were spaces non-adjacent to the closing (or opening) tags?
And what if it was another pair of tags?

I.e. is there a generic regex which would find multiple spaces in any position between any opening and closing tags?

theducks 11-16-2017 09:48 PM

Quote:

Originally Posted by mikapanja (Post 3612990)
Yes, but what if there were spaces non-adjacent to the closing (or opening) tags?
And what if it was another pair of tags?

I.e. is there a generic regex which would find multiple spaces in any position between any opening and closing tags?

I think 'pretty' cleans up code.
Many devices do not render repetitive normal spaces

Paulie_D 11-17-2017 08:15 AM

Quote:

I.e. is there a generic regex which would find multiple spaces in any position between any opening and closing tags?
I just use

Find: \s\s+
Replace: [space]

Then use "pretty" to clean up.

Phssthpok 11-17-2017 01:45 PM

Quote:

Originally Posted by mikapanja (Post 3612990)
Yes, but what if there were spaces non-adjacent to the closing (or opening) tags?
And what if it was another pair of tags?

I.e. is there a generic regex which would find multiple spaces in any position between any opening and closing tags?

Find: (<(\w+)[^>]*>.*?) {2,}(.*?</\2>)
Replace with: \1 \3

This uses a real space rather than \s, as \s also matches newlines. Use [ \t] if you want tabs as well. The problem is that this will match the first tag containing multiple spaces, which will be <body>...</body>; better to do something like this, listing the tags you want to match:

Find: (<(p|div|blockquote|i)( [^>]*)?>.*?) {2,}(.*?</\2>)
Replace with: \1 \4

The use of ( [^>]*)? instead of [^>]* is to guard against matching e.g. <img> with </i>.

mikapanja 11-17-2017 08:17 PM

@ Phssthpok

Excellent, many thanks!

<nitpicking> Your solution matches the whole line, everything between the opening and closing tags containing multiple spaces (which in itself is very useful).

Could you perhaps come up with a regex that matches such spaces only?</nitpicking>

Phssthpok 11-18-2017 07:22 AM

Quote:

Originally Posted by mikapanja (Post 3613578)
@ Phssthpok

Excellent, many thanks!

<nitpicking> Your solution matches the whole line, everything between the opening and closing tags containing multiple spaces (which in itself is very useful).

Could you perhaps come up with a regex that matches such spaces only?</nitpicking>

Nope! :)

But I assume that it's the spaces you want to replace --and what I gave you will at least do that. The replacement string replaces them with a single space, which I assumed was what you wanted.

Phssthpok 11-18-2017 08:01 AM

Quote:

Originally Posted by mikapanja (Post 3613578)
<nitpicking> Your solution matches the whole line, everything between the opening and closing tags containing multiple spaces (which in itself is very useful).

Could you perhaps come up with a regex that matches such spaces only?</nitpicking>

Here's a thought: Use what I've already given you to replace all the spaces with something that doesn't occur in the text, e.g. ∀.
Then find ∀, and do whatever you want at each match.

mikapanja 11-18-2017 08:11 AM

Quote:

Originally Posted by Phssthpok (Post 3613758)
Here's a thought: Use what I've already given you to replace all the spaces with something that doesn't occur in the text, e.g. ∀.
Then find ∀, and do whatever you want at each match.

@ Phssthpok

Perfect :thumbsup:

Thanks a bunch!


All times are GMT -4. The time now is 10:50 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.