![]() |
#1 |
Connoisseur
![]() Posts: 62
Karma: 10
Join Date: Mar 2024
Device: none
|
Possible RegEx error in Sigil (minimal match being ignored)
Hello,
I'm trying to delete all 'span' tags with no content (i.e. <span blah blah>(nothing here)</span> by using the following RegEx (with minimal match enabled) <span .*></span> Here is some example text which shouldn't match anything <hgroup><h2 class="CHAPTER" id="ch1"><span class="CN"><samp class="SANS_Futura_Std_Bold_Condensed_B_11">1</samp></span> <span class="CT"><samp class="SANS_Dogma_OT_Bold_B_11">WINDOWS FOUNDATIONAL CONCEPTS</samp></span></h2></hgroup> As we can see, there are no span tags without content. Running this RegEx however matches the following text (as can be seen in the attached screenshot) <span class="CN"><samp class="SANS_Futura_Std_Bold_Condensed_B_11">1</samp></span> Is there anything I have missed, or is this indeed an error? Rgds Karl |
![]() |
![]() |
![]() |
#2 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,679
Karma: 23983815
Join Date: Dec 2010
Device: Kindle PW2
|
The .* in your regular expression also matches other tags. The following expression should work:
Code:
<span[^>]*></span> |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Connoisseur
![]() Posts: 62
Karma: 10
Join Date: Mar 2024
Device: none
|
|
![]() |
![]() |
![]() |
#4 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,338
Karma: 203719142
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Nope. Minimal Match can't affect the inherent greediness of *.
At least I don't recall it doing so in the past. Last edited by DiapDealer; 08-18-2024 at 01:07 PM. |
![]() |
![]() |
![]() |
#5 | |
Connoisseur
![]() Posts: 62
Karma: 10
Join Date: Mar 2024
Device: none
|
Quote:
So what IS 'minimal match' for then, and what's the option (if there is one) for a non-greedy RegEx in Sigil? |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,338
Karma: 203719142
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Doitsu's regex was an example of a non-greedy search. But let me do some investigation. I don't really use the Minimal Match option, so there could be a problem with it. I don't want to be too nasty in dismissing what might be a bug.
|
![]() |
![]() |
![]() |
#7 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,437
Karma: 5702578
Join Date: Nov 2009
Device: many
|
And no the text search will only stop at the first ">" when a match is found, without a match it will continue to grow the search area until it finds the first match or none at all. That is what the "minimal match" flag means. It finds the minimal length match if one exists.
Last edited by KevinH; 08-21-2024 at 05:40 PM. |
![]() |
![]() |
![]() |
#8 |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 44,466
Karma: 167726775
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Personally, I find it easier to use @DiapDealer's TagMechanic plugin for this type of task. Saves me from the issues when my fat fingers cause a typo,
|
![]() |
![]() |
![]() |
#9 |
Enthusiast
![]() Posts: 38
Karma: 10
Join Date: Aug 2018
Device: kobo Nia
|
Hi,
with Sigil 2.4.2 the bug seems gone, `<div .*</div>` grabs * without MinimalMatch in Regex Options until the last possible end of pattern, * with MinimalMatch in Regex Options until the next possible end of pattern. Last edited by recook; 03-02-2025 at 08:11 AM. Reason: double sig |
![]() |
![]() |
![]() |
#10 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,437
Karma: 5702578
Join Date: Nov 2009
Device: many
|
There was no bug. See my earlier post about how minimal match actually works.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
match an empty line with a regex? | lumpynose | Sigil | 5 | 05-29-2019 03:03 AM |
REGEX match everything before # | JLius | ePub | 2 | 01-08-2017 04:25 PM |
[Regex Search] Minimal match not possible? | nqk | Editor | 7 | 12-24-2014 03:19 AM |
how to have regex dot match any character including newline? | gnychis | Calibre | 5 | 11-30-2010 06:35 PM |
Need help with a conversion regex - can't match newline | ereader123 | Calibre | 2 | 03-29-2010 10:58 AM |