Thread: Regex examples
View Single Post
Old 06-22-2020, 07:34 PM   #651
Mister L
Groupie
Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Posts: 179
Karma: 91148
Join Date: Jun 2010
Device: Sony 350
Quote:
Originally Posted by Doitsu View Post
AFAIK, if Minimal Match is selected, Sigil will prefix the search string with (?U).

From the PCRE documentation:

(If you remove the question mark from my regex and select Minimal Match, it works as expected.)
Ok, very interesting, I did not realise it inverted whatever was already present.


My original question was about fixing chapter headings to make it possible to easily regenerate a TOC, as I recently had a file with chapter headings in this format. To get back to that, is it possible to keep the original text as is, but copy the modified text into a title attribute, bearing in mind there can be a variable number of sets of spans in the title?

I know how to do it with only one set, as I said, but except if I do it in multiple passes (3 sets then 2 sets then 1 set) I don't think this pattern works.

For instance, a heading (that may or may not have a class and may or may not have an ID), and the text is in fake small-caps. Some titles may have one or more capitalised words in the middle but not all of them. I want to add the title attribute in sentence case.

Find:
<h1 class="chapter" id="id01"><span class="Cap">F</span><span class="SmallCap">IRST WORD OF THE SENTENCE IS ALWAYS CAPITALISED</span></h1>

But also:
<h1 class="chapter" id="id01"><span class="Cap">F</span><span class="SmallCap">IRST WORD OF THE SENTENCE IS ALWAYS CAPITALISED,OTHER</span> <span class="Cap">W</span><span class="SmallCap">ORDS IN THE SENTENCE MAY OR MAY NOT BE CAPITALISED</span></h1>

And also:
<h1 class="chapter" id="id01"><span class="Cap">F</span><span class="SmallCap">IRST WORD OF THE SENTENCE IS ALWAYS CAPITALISED,</span> <span class="Cap">O</span><span class="SmallCap">THER</span> <span class="Cap">W</span><span class="SmallCap">ORDS IN THE SENTENCE MAY OR MAY NOT BE CAPITALISED</span></h1>

etc.

Replace:

<h1 class="chapter" id="id01" title="First word of the sentence is always capitalised, Other Words in the sentence may or may not be capitalised"><span class="Cap">F</span><span class="SmallCap">IRST WORD OF THE SENTENCE IS ALWAYS CAPITALISED,</span> <span class="Cap">O</span><span class="SmallCap">THER</span> <span class="Cap">W</span><span class="SmallCap">ORDS IN THE SENTENCE MAY OR MAY NOT BE CAPITALISED</span></h1>

(and other variations with zero or more capitalised words in the middle of the title)


I usually use some variation of this:
Search:
<h1 class="chapter" id="(.*)"><span class="Cap">(.)</span><span class="SmallCap">(.*)</span></h1>

Replace:
<h1 class="chapter" id="\1" title="\2\L\3\E"><span class="Cap">\2</span><span class="SmallCap">\3</span></h1>

It works if there is only one set of spans. If there is more than one set I have to multiply the search variables and the replace variables so there are the same number of sets of each, for iterations of the same search, or correct them by hand one by one as I go along.
Mister L is offline   Reply With Quote