Thread: Regex examples
View Single Post
Old 08-19-2022, 02:09 PM   #731
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by CubGeek View Post
Cheers, that'll help significantly. Luckily, the few things I'm crafting are small enough, and I'm doing them slow enough, that there isn't much "spaghettification" of the code, or the whole <span>ception of nested <span>s thing that I've seen when I peeked inside a couple of my purchased or calibre-converted books.
It usually happens around footnotes and all sorts of other complicated nesting:

Code:
<p class="normal"><span class="normal">This is an <span class="italics">example</span>.<sup><span class="tiny">1</span></sup></span></p>
Let's say you were trying to correct (or remove) that outside <span class="normal">.

Regular Expressions would get completely confused with the 3 different </span>s, where TagMechanic would be able to figure out which </span> connects with which one.

Of course, with clean code, this wouldn't be a problem, but in real life there's always these crazy examples that creep up... and it comes to bite you in the butt later when you already accidentally did a "Replace All" 3 hours ago!

Quote:
Originally Posted by CubGeek View Post
Yup! Note my edit above where I learned about Capture Groups and backreferences and... However, I like your explanation better. Much more user friendly.


You can also use those in FINDs as well!

For example, one of the tricks I use is:

Double Word Check

Find: (\b[a-z]+) (\1\b)
Replace: \1

This grabs a lowercase word + looks for it again:
  • Did you see the reactor reactor?
  • What are you doing in that that area?
  • If only they had had enough power to use the ultrasound machine for each pregnancy, he would have detected the problem earlier and been able to plan the C-section.

How does it work?

It uses a few tricks:
  • \b = a "word boundary". (Beginning of word)
  • [a-z] = lowercase letters 'a' through 'z'.
  • + = ONE OR MORE of previous thing.

Shove all that in GROUP 1.
  • \1 = Look for GROUP 1 again.
  • \b = a "word boundary". (End of word)

Shove all that in GROUP 2.

Now, when you replace, you're only replacing with GROUP 1, meaning that duplicated word never makes it:
  • Did you see the reactor?
  • What are you doing in that area?



- - -

Usage Note: You do have to be careful of false positives though, so NEVER do a "Replace All".

Always do a one-by-one check.

There shouldn't ever be too many "doubles" within your book, but they're an extremely common typo that's very hard to catch. (Usually the human brain just skips right over them.)

- - -

Quote:
Originally Posted by CubGeek View Post
Oh, I did. *twitch* I'm sure I was mumbling about em's and i's and strong's and b's (oh my!) in my sleep
Me too. Took me many years to finally get it boiled down.

Glad to see someone benefited from all those in-depth discussions.

Last edited by Tex2002ans; 08-19-2022 at 02:12 PM.
Tex2002ans is offline   Reply With Quote