MobileRead Forums - View Single Post

Tex2002ans · 08-19-2022, 02:09 PM

Quote:

Originally Posted by CubGeek

Cheers, that'll help significantly. Luckily, the few things I'm crafting are small enough, and I'm doing them slow enough, that there isn't much "spaghettification" of the code, or the whole ception of nested s thing that I've seen when I peeked inside a couple of my purchased or calibre-converted books.

It usually happens around footnotes and all sorts of other complicated nesting:

Code:

<p class="normal"><span class="normal">This is an <span class="italics">example</span>.<sup><span class="tiny">1</span></sup></span></p>

Let's say you were trying to correct (or remove) that outside .

Regular Expressions would get completely confused with the 3 different s, where TagMechanic would be able to figure out which connects with which one.

Of course, with clean code, this wouldn't be a problem, but in real life there's always these crazy examples that creep up... and it comes to bite you in the butt later when you already accidentally did a "Replace All" 3 hours ago!

Quote:

Originally Posted by CubGeek

Yup! Note my edit above where I learned about Capture Groups and backreferences and...

However, I like your explanation better.

Much more user friendly.

You can also use those in FINDs as well!

For example, one of the tricks I use is:

Double Word Check

Find: (\b[a-z]+) (\1\b)
Replace: \1

This grabs a lowercase word + looks for it again:

Did you see the reactor reactor?
What are you doing in that that area?
If only they had had enough power to use the ultrasound machine for each pregnancy, he would have detected the problem earlier and been able to plan the C-section.

How does it work?

It uses a few tricks:

\b = a "word boundary". (Beginning of word)
[a-z] = lowercase letters 'a' through 'z'.
+ = ONE OR MORE of previous thing.

Shove all that in GROUP 1.

\1 = Look for GROUP 1 again.
\b = a "word boundary". (End of word)

Shove all that in GROUP 2.

Now, when you replace, you're only replacing with GROUP 1, meaning that duplicated word never makes it:

Did you see the reactor?
What are you doing in that area?

- - -

Usage Note: You do have to be careful of false positives though, so NEVER do a "Replace All".

Always do a one-by-one check.

There shouldn't ever be too many "doubles" within your book, but they're an extremely common typo that's very hard to catch. (Usually the human brain just skips right over them.)

- - -

Quote:

Originally Posted by CubGeek

Oh, I did. *twitch*

I'm sure I was mumbling about em's and i's and strong's and b's (oh my!) in my sleep

Me too. Took me many years to finally get it boiled down.

Glad to see someone benefited from all those in-depth discussions.