Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 11-16-2017, 10:16 AM   #1
mikapanja
Perfectionist
mikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
Posts: 62
Karma: 12802
Join Date: Apr 2014
Device: none
Regex to find multiple spaces between HTML tags

Interesting question from another forum.

Is there a Python regex to match multiple spaces between the opening and closing HTML tags,
disregarding the leading and trailing whitespaces and the fact that browsers collapse whitespace on render?

For instance, in:
Code:
<p><i>blah</i> §111   </p>
how does one capture the three spaces after 111?
mikapanja is offline   Reply With Quote
Old 11-16-2017, 11:15 AM   #2
deback
Book E d i t o r
deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.
 
Posts: 432
Karma: 288184
Join Date: May 2015
Device: Laptop
My simple method would be to highlight that space, copy it to the clipboard, and paste it into the Find box. Then replace all with nothing in the Replace box.
deback is offline   Reply With Quote
Old 11-16-2017, 02:43 PM   #3
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,659
Karma: 54369090
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by mikapanja View Post
Interesting question from another forum.

Is there a Python regex to match multiple spaces between the opening and closing HTML tags,
disregarding the leading and trailing whitespaces and the fact that browsers collapse whitespace on render?

For instance, in:
Code:
<p><i>blah</i> §111   </p>
how does one capture the three spaces after 111?
Code:
\s{3,}</p>  ;3 or more before a P
\s also matches newlines, so the requirement for the closing p tag
theducks is offline   Reply With Quote
Old 11-16-2017, 06:38 PM   #4
mikapanja
Perfectionist
mikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
Posts: 62
Karma: 12802
Join Date: Apr 2014
Device: none
Quote:
Originally Posted by theducks View Post
Code:
\s{3,}</p>  ;3 or more before a P
\s also matches newlines, so the requirement for the closing p tag
Yes, but what if there were spaces non-adjacent to the closing (or opening) tags?
And what if it was another pair of tags?

I.e. is there a generic regex which would find multiple spaces in any position between any opening and closing tags?
mikapanja is offline   Reply With Quote
Old 11-16-2017, 08:48 PM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,659
Karma: 54369090
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by mikapanja View Post
Yes, but what if there were spaces non-adjacent to the closing (or opening) tags?
And what if it was another pair of tags?

I.e. is there a generic regex which would find multiple spaces in any position between any opening and closing tags?
I think 'pretty' cleans up code.
Many devices do not render repetitive normal spaces
theducks is offline   Reply With Quote
Old 11-17-2017, 07:15 AM   #6
Paulie_D
Connoisseur
Paulie_D began at the beginning.
 
Paulie_D's Avatar
 
Posts: 67
Karma: 10
Join Date: Apr 2011
Device: Kindle 3, Samsung Tab 4
Quote:
I.e. is there a generic regex which would find multiple spaces in any position between any opening and closing tags?
I just use

Find: \s\s+
Replace: [space]

Then use "pretty" to clean up.
Paulie_D is offline   Reply With Quote
Old 11-17-2017, 12:45 PM   #7
Phssthpok
Age improves with wine.
Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.
 
Posts: 558
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
Quote:
Originally Posted by mikapanja View Post
Yes, but what if there were spaces non-adjacent to the closing (or opening) tags?
And what if it was another pair of tags?

I.e. is there a generic regex which would find multiple spaces in any position between any opening and closing tags?
Find: (<(\w+)[^>]*>.*?) {2,}(.*?</\2>)
Replace with: \1 \3

This uses a real space rather than \s, as \s also matches newlines. Use [ \t] if you want tabs as well. The problem is that this will match the first tag containing multiple spaces, which will be <body>...</body>; better to do something like this, listing the tags you want to match:

Find: (<(p|div|blockquote|i)( [^>]*)?>.*?) {2,}(.*?</\2>)
Replace with: \1 \4

The use of ( [^>]*)? instead of [^>]* is to guard against matching e.g. <img> with </i>.
Phssthpok is offline   Reply With Quote
Old 11-17-2017, 07:17 PM   #8
mikapanja
Perfectionist
mikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
Posts: 62
Karma: 12802
Join Date: Apr 2014
Device: none
@ Phssthpok

Excellent, many thanks!

<nitpicking> Your solution matches the whole line, everything between the opening and closing tags containing multiple spaces (which in itself is very useful).

Could you perhaps come up with a regex that matches such spaces only?</nitpicking>
mikapanja is offline   Reply With Quote
Old 11-18-2017, 06:22 AM   #9
Phssthpok
Age improves with wine.
Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.
 
Posts: 558
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
Quote:
Originally Posted by mikapanja View Post
@ Phssthpok

Excellent, many thanks!

<nitpicking> Your solution matches the whole line, everything between the opening and closing tags containing multiple spaces (which in itself is very useful).

Could you perhaps come up with a regex that matches such spaces only?</nitpicking>
Nope!

But I assume that it's the spaces you want to replace --and what I gave you will at least do that. The replacement string replaces them with a single space, which I assumed was what you wanted.
Phssthpok is offline   Reply With Quote
Old 11-18-2017, 07:01 AM   #10
Phssthpok
Age improves with wine.
Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.
 
Posts: 558
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
Quote:
Originally Posted by mikapanja View Post
<nitpicking> Your solution matches the whole line, everything between the opening and closing tags containing multiple spaces (which in itself is very useful).

Could you perhaps come up with a regex that matches such spaces only?</nitpicking>
Here's a thought: Use what I've already given you to replace all the spaces with something that doesn't occur in the text, e.g. ∀.
Then find ∀, and do whatever you want at each match.
Phssthpok is offline   Reply With Quote
Old 11-18-2017, 07:11 AM   #11
mikapanja
Perfectionist
mikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermikapanja can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
Posts: 62
Karma: 12802
Join Date: Apr 2014
Device: none
Quote:
Originally Posted by Phssthpok View Post
Here's a thought: Use what I've already given you to replace all the spaces with something that doesn't occur in the text, e.g. ∀.
Then find ∀, and do whatever you want at each match.
@ Phssthpok

Perfect

Thanks a bunch!
mikapanja is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Q: Regex Find and Replace delete surrounding tags hidden.platypus Editor 14 06-16-2015 11:16 PM
regex - issue with spaces? cybmole Editor 43 12-31-2013 12:49 PM
Regex Find and Replace - Spaces essayhead Sigil 2 08-10-2012 07:41 PM
Help to compose a regex to find strings, enclosed in comments tags Vadim777 Conversion 5 04-17-2012 12:49 PM
Sigil adds spaces between HTML tags, creating strange characters chezjim Sigil 10 06-10-2011 04:00 AM


All times are GMT -4. The time now is 12:55 AM.


MobileRead.com is a privately owned, operated and funded community.