Thread: Regex examples
View Single Post
Old 08-06-2019, 05:31 PM   #603
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by roger64 View Post
The above regex works fine.

Maybe it's a little greedy because it also transforms words written in capitals which are included in the head like "DOCTYPE".
Don't use Replace All. You'll have to decide on a case-by-case basis, because there's still words that are in ALL CAPS that are valid, like: DNA, FBI, FDA, etc.

Quote:
Originally Posted by lumpynose View Post
What about having Sigil exclude the stuff outside the body tags? Another check box, for example, in the search options. So that search and replace is given only the stuff within the body tags.
epubcheck and other tools will squawk at you because of this code, and usually it's a sign of some serious underlying issue (bad conversion, bad S&R, horribly coded site, etc.).

You might want to do something like:

Search: </p>\s*([^<]+?)\s+
Replace: </p><p class="notag">\1</p>

This'll help point out those problem areas, then you can do a big pass cleaning up all the "notag" classes and adjusting those issues.
Tex2002ans is offline   Reply With Quote