05-05-2022, 05:47 AM | #721 |
Enthusiast
Posts: 30
Karma: 10
Join Date: Mar 2019
Location: Slovenia
Device: PocketBoot Inkpad 3
|
Any idea on how to capture uppercase words with special diacritic characters, like Ū Ṃ Ḥ Ū etc.?
I tried the following, but it doesn't work. I want to capture uppercase words with 2 or more characters. Code:
([[:upper:]]{2,}) |
05-05-2022, 06:15 AM | #723 |
Enthusiast
Posts: 30
Karma: 10
Join Date: Mar 2019
Location: Slovenia
Device: PocketBoot Inkpad 3
|
@BeckyEbook, thank you!
|
05-05-2022, 06:55 AM | #724 |
Grand Sorcerer
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Also remember that \p{Lu} and \p{Ll} can be used to match any uppercase (and consequently, lowercase) letter in any language without requiring the *UCP switch (in Sigil's PCRE regex engine).
\p{L} matches any letter (Unicode or otherwise) and \P{L} matches anything NOT a letter. So (\p{Lu}{2,}) should theoretically do the same thing (not near a machine to verify syntax). See the Unicode Categories section of https://www.regular-expressions.info/unicode.html for more categories. |
08-18-2022, 01:51 PM | #725 |
Connoisseur
Posts: 52
Karma: 10
Join Date: Sep 2021
Location: Upstate NY, USA
Device: iPad Pro, Kindle basic
|
oh.... wow. 49 pages over the course of ten years?! well, this Regex newbie's got a lot of reading homework, it seems.
|
08-18-2022, 02:48 PM | #726 |
Connoisseur
Posts: 52
Karma: 10
Join Date: Sep 2021
Location: Upstate NY, USA
Device: iPad Pro, Kindle basic
|
Okay, after reading the <i>, <em> or <span> for italics thread from 2020, and then reading the Extended <head> chapter: NOT necessary? 2017 thread linked therein [and paying particular attention to Tex2002ans posting about the underlying purposes for <em> and <i> <em>therein</em> () ], I've seen the error of my ways regarding using <span> for setting italics.
I've figured out that Code:
<span class="italics">([^>]+)</span> I'm happy to do the legwork and the trial-and-error to learn what works. I guess my search skills also need an update, too, because the results I am turning up don't seem to work for me. Can someone help point me in the right direction? [edit] Okay, I THINK I found it, but it was hit-or miss, because it seemed that everything was for Javascript/C##/VB.net/PHP/ruby/etc. so, it seems that some trial-and-error resulted in me learning about <i>backreferences</i> and <i>capture groups</i>. I've gotten it to work so that Code:
<em>\g<1></em> Okay, next question: is this a kludge and there's a better way? or is this correct? Thanks, y'all! [/edit] Last edited by CubGeek; 08-18-2022 at 03:22 PM. |
08-18-2022, 04:32 PM | #727 |
A Hairy Wizard
Posts: 3,093
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
That's pretty advanced stuff!
I go pretty easy...and it seems to work so far... find: <i>(.*?)</i> replace: <em>\1</em> or find: <span class="italics>(.*?)</span> replace: <em>\1</em> etc. |
08-18-2022, 10:11 PM | #728 |
Connoisseur
Posts: 52
Karma: 10
Join Date: Sep 2021
Location: Upstate NY, USA
Device: iPad Pro, Kindle basic
|
Oh, that's much simpler. Thank you! Since the stuff I'm working on has a combination of <i> for "inside voice," and "named things" as well as <em> for word emphasis, this certainly has been a learning experience!
|
08-18-2022, 11:04 PM | #729 | ||||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
The easiest way to do it is to use DiapDealer's fantastic "TagMechanic" plugin. I explained how to install Sigil plugins in this 2021 post. And I gave step-by-step instructions on how to use TagMechanic here: That will help mass convert your <span class="italics"> -> <i> or <em>. It will be much safer than trying to use Regular Expressions, because regex can't safely handle complicated cases of <span>s inside of <span>s. Quote:
Replace: <i>\1</i> You see the parentheses you wrapped around your stuff? That's called a "Capture Group". Explanation of the Find Let's break it down into each piece:
It's saying:
Now when you're Replacing, you can use \1 to get "Group #1". Explanation of the Replace
- - - Side Note: If you have more complicated regex, you can get up to 9 capture groups! \1, \2, \3, [...], \9 But at that point, it's probably smarter to split your search/replaces into smaller pieces. - - - Side Note #2: If you want some more Regex tricks, I just wrote a post a few months ago here: which linked to some of my other posts over the years. I break down + color-coordinate many of the ones I use. Quote:
Easier/Safer to use Tag Mechanic though. :P Quote:
where I explained differences between <i> + <em> even further. Last edited by Tex2002ans; 08-18-2022 at 11:12 PM. |
||||
08-19-2022, 11:43 AM | #730 | ||||
Connoisseur
Posts: 52
Karma: 10
Join Date: Sep 2021
Location: Upstate NY, USA
Device: iPad Pro, Kindle basic
|
Quote:
Quote:
Quote:
Quote:
|
||||
08-19-2022, 02:09 PM | #731 | |||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Code:
<p class="normal"><span class="normal">This is an <span class="italics">example</span>.<sup><span class="tiny">1</span></sup></span></p> Regular Expressions would get completely confused with the 3 different </span>s, where TagMechanic would be able to figure out which </span> connects with which one. Of course, with clean code, this wouldn't be a problem, but in real life there's always these crazy examples that creep up... and it comes to bite you in the butt later when you already accidentally did a "Replace All" 3 hours ago! Quote:
You can also use those in FINDs as well! For example, one of the tricks I use is: Double Word Check Find: (\b[a-z]+) (\1\b) Replace: \1 This grabs a lowercase word + looks for it again:
How does it work? It uses a few tricks:
Shove all that in GROUP 1.
Shove all that in GROUP 2. Now, when you replace, you're only replacing with GROUP 1, meaning that duplicated word never makes it:
- - - Usage Note: You do have to be careful of false positives though, so NEVER do a "Replace All". Always do a one-by-one check. There shouldn't ever be too many "doubles" within your book, but they're an extremely common typo that's very hard to catch. (Usually the human brain just skips right over them.) - - - Quote:
Glad to see someone benefited from all those in-depth discussions. Last edited by Tex2002ans; 08-19-2022 at 02:12 PM. |
|||
08-19-2022, 02:25 PM | #732 |
Resident Curmudgeon
Posts: 73,897
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Use <i> and <b> and forget <em> and <strong> ever existed.
|
08-19-2022, 02:35 PM | #733 |
Grand Sorcerer
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Drop it Jon. Your preferences are not really relevant to the conversation at hand.
|
08-19-2022, 04:28 PM | #734 |
Connoisseur
Posts: 52
Karma: 10
Join Date: Sep 2021
Location: Upstate NY, USA
Device: iPad Pro, Kindle basic
|
After reading threads that spanned (ha! <span>ned! ) 5+ years, and seeing you spouting the same thing about <i> and <em> and <b> and <strong> (regardless of being educated better), I'll at least give you credit for consistency. But that's all. Thanks for your input.
|
08-19-2022, 04:30 PM | #735 | |
Connoisseur
Posts: 52
Karma: 10
Join Date: Sep 2021
Location: Upstate NY, USA
Device: iPad Pro, Kindle basic
|
Quote:
So, if my learning how to properly show varying types of emphasis to help convey nuances for someone who's relying on a screen-reader or similar (on the very infinitesimal chance they access something that I put together) then it was time well-spent. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Examples of Subgroups | emonti8384 | Lounge | 32 | 02-26-2011 06:00 PM |
Accessories Pen examples | Gunnerp245 | enTourage Archive | 15 | 02-21-2011 03:23 PM |
Stylesheet examples? | Skitzman69 | Sigil | 15 | 09-24-2010 08:24 PM |
Examples | kafkaesque1978 | iRiver Story | 1 | 07-26-2010 03:49 PM |
Looking for examples of typos in eBooks | Tonycole | General Discussions | 1 | 05-05-2010 04:23 AM |