Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 05-05-2022, 05:47 AM   #721
Skydancer
Enthusiast
Skydancer began at the beginning.
 
Skydancer's Avatar
 
Posts: 30
Karma: 10
Join Date: Mar 2019
Location: Slovenia
Device: PocketBoot Inkpad 3
Any idea on how to capture uppercase words with special diacritic characters, like Ū Ṃ Ḥ Ū etc.?
I tried the following, but it doesn't work. I want to capture uppercase words with 2 or more characters.
Code:
([[:upper:]]{2,})
Skydancer is offline   Reply With Quote
Old 05-05-2022, 06:09 AM   #722
BeckyEbook
Guru
BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.
 
BeckyEbook's Avatar
 
Posts: 692
Karma: 2180740
Join Date: Jan 2017
Location: Poland
Device: Misc
(*UCP) enables unicode properties for the expression that follows. [*]

Use:
Code:
(*UCP)([[:upper:]]{2,})
BeckyEbook is offline   Reply With Quote
Old 05-05-2022, 06:15 AM   #723
Skydancer
Enthusiast
Skydancer began at the beginning.
 
Skydancer's Avatar
 
Posts: 30
Karma: 10
Join Date: Mar 2019
Location: Slovenia
Device: PocketBoot Inkpad 3
@BeckyEbook, thank you!
Skydancer is offline   Reply With Quote
Old 05-05-2022, 06:55 AM   #724
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Also remember that \p{Lu} and \p{Ll} can be used to match any uppercase (and consequently, lowercase) letter in any language without requiring the *UCP switch (in Sigil's PCRE regex engine).

\p{L} matches any letter (Unicode or otherwise) and \P{L} matches anything NOT a letter.

So (\p{Lu}{2,}) should theoretically do the same thing (not near a machine to verify syntax).

See the Unicode Categories section of https://www.regular-expressions.info/unicode.html for more categories.
DiapDealer is offline   Reply With Quote
Old 08-18-2022, 01:51 PM   #725
CubGeek
Connoisseur
CubGeek began at the beginning.
 
Posts: 52
Karma: 10
Join Date: Sep 2021
Location: Upstate NY, USA
Device: iPad Pro, Kindle basic
oh.... wow. 49 pages over the course of ten years?! well, this Regex newbie's got a lot of reading homework, it seems.
CubGeek is offline   Reply With Quote
Old 08-18-2022, 02:48 PM   #726
CubGeek
Connoisseur
CubGeek began at the beginning.
 
Posts: 52
Karma: 10
Join Date: Sep 2021
Location: Upstate NY, USA
Device: iPad Pro, Kindle basic
Okay, after reading the <i>, <em> or <span> for italics thread from 2020, and then reading the Extended <head> chapter: NOT necessary? 2017 thread linked therein [and paying particular attention to Tex2002ans posting about the underlying purposes for <em> and <i> <em>therein</em> () ], I've seen the error of my ways regarding using <span> for setting italics.
  1. The up side: I've only just started dipping my toes into the waters with converting my documents into ePub format, so I'm learning good things!
  2. The down side: I now need to learn Regex to be able to search through the files to correct my earlier <span class="abuse">. Thanks, karma.

I've figured out that
Code:
<span class="italics">([^>]+)</span>
will catch every instance of the offending tags on both sides of the content so affected. However, I can't seem to figure out how to get the REPLACE function to leave the content alone and replace <em>just</em> the tags themselves.

I'm happy to do the legwork and the trial-and-error to learn what works. I guess my search skills also need an update, too, because the results I am turning up don't seem to work for me. Can someone help point me in the right direction?

[edit] Okay, I THINK I found it, but it was hit-or miss, because it seemed that everything was for Javascript/C##/VB.net/PHP/ruby/etc. so, it seems that some trial-and-error resulted in me learning about <i>backreferences</i> and <i>capture groups</i>. I've gotten it to work so that
Code:
<em>\g<1></em>
works. whew.

Okay, next question: is this a kludge and there's a better way? or is this correct? Thanks, y'all! [/edit]

Last edited by CubGeek; 08-18-2022 at 03:22 PM.
CubGeek is offline   Reply With Quote
Old 08-18-2022, 04:32 PM   #727
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 3,093
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
That's pretty advanced stuff!

I go pretty easy...and it seems to work so far...

find: <i>(.*?)</i>
replace: <em>\1</em>

or

find: <span class="italics>(.*?)</span>
replace: <em>\1</em>


etc.
Turtle91 is offline   Reply With Quote
Old 08-18-2022, 10:11 PM   #728
CubGeek
Connoisseur
CubGeek began at the beginning.
 
Posts: 52
Karma: 10
Join Date: Sep 2021
Location: Upstate NY, USA
Device: iPad Pro, Kindle basic
Quote:
Originally Posted by Turtle91 View Post
That's pretty advanced stuff!

I go pretty easy...and it seems to work so far...

find: <i>(.*?)</i>
replace: <em>\1</em>

or

find: <span class="italics>(.*?)</span>
replace: <em>\1</em>


etc.
Oh, that's much simpler. Thank you! Since the stuff I'm working on has a combination of <i> for "inside voice," and "named things" as well as <em> for word emphasis, this certainly has been a learning experience!
CubGeek is offline   Reply With Quote
Old 08-18-2022, 11:04 PM   #729
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by CubGeek View Post
Okay, after reading the <i>, <em> or <span> for italics thread from 2020 [...] [and paying particular attention to Tex2002ans posting about the underlying purposes for <em> and <i> <em>therein</em> () ], I've seen the error of my ways regarding using <span> for setting italics.


The easiest way to do it is to use DiapDealer's fantastic "TagMechanic" plugin.

I explained how to install Sigil plugins in this 2021 post.

And I gave step-by-step instructions on how to use TagMechanic here:

That will help mass convert your <span class="italics"> -> <i> or <em>.

It will be much safer than trying to use Regular Expressions, because regex can't safely handle complicated cases of <span>s inside of <span>s.

Quote:
Originally Posted by CubGeek View Post
I've figured out that
Code:
<span class="italics">([^>]+)</span>
Find: <span class="italics">([^<]+)</span>
Replace: <i>\1</i>

You see the parentheses you wrapped around your stuff? That's called a "Capture Group".

Explanation of the Find

Let's break it down into each piece:
  • <span class="italics">
  • (
    • [^<]+
  • )
  • </span>

It's saying:
  • "Hey, find the italics <span>."
  • "You see this open parenthesis? Stick this next stuff into a group!"
    • "Keep grabbing everything that's NOT a '<'.
  • "Closing parenthesis? Everything captured between them goes into GROUP 1!"
  • "Hey, find the closing </span>."

Now when you're Replacing, you can use \1 to get "Group #1".

Explanation of the Replace
  • <i> = "Put the opening <i>."
  • \1 = "Put whatever was captured in GROUP 1 here."
  • </i> = "Put the closing </i>."

- - -

Side Note: If you have more complicated regex, you can get up to 9 capture groups!

\1, \2, \3, [...], \9

But at that point, it's probably smarter to split your search/replaces into smaller pieces.

- - -

Side Note #2: If you want some more Regex tricks, I just wrote a post a few months ago here:

which linked to some of my other posts over the years. I break down + color-coordinate many of the ones I use.

Quote:
Originally Posted by Turtle91 View Post
I go pretty easy...and it seems to work so far...

find: <i>(.*?)</i>
replace: <em>\1</em>

or

find: <span class="italics>(.*?)</span>
replace: <em>\1</em>
Yep, this type of stuff works too.

Easier/Safer to use Tag Mechanic though. :P

Quote:
Originally Posted by CubGeek View Post
Since the stuff I'm working on has a combination of <i> for "inside voice," and "named things" as well as <em> for word emphasis, this certainly has been a learning experience!
And I don't know if you caught this topic:

where I explained differences between <i> + <em> even further.

Last edited by Tex2002ans; 08-18-2022 at 11:12 PM.
Tex2002ans is offline   Reply With Quote
Old 08-19-2022, 11:43 AM   #730
CubGeek
Connoisseur
CubGeek began at the beginning.
 
Posts: 52
Karma: 10
Join Date: Sep 2021
Location: Upstate NY, USA
Device: iPad Pro, Kindle basic
Quote:
Originally Posted by Tex2002ans View Post
The easiest way to do it is to use DiapDealer's fantastic "TagMechanic" plugin.

I explained how to install Sigil plugins in this 2021 post.

And I gave step-by-step instructions on how to use TagMechanic here:
Cheers, that'll help significantly. Luckily, the few things I'm crafting are small enough, and I'm doing them slow enough, that there isn't much "spaghettification" of the code, or the whole <span>ception of nested <span>s thing that I've seen when I peeked inside a couple of my purchased or calibre-converted books.

Quote:
You see the parentheses you wrapped around your stuff? That's called a "Capture Group".
Yup! Note my edit above where I learned about Capture Groups and backreferences and... However, I like your explanation better. Much more user friendly.


Quote:
Side Note #2: If you want some more Regex tricks, I just wrote a post a few months ago here:

which linked to some of my other posts over the years. I break down + color-coordinate many of the ones I use.
bookmarked!

Quote:
And I don't know if you caught this topic:

where I explained differences between <i> + <em> even further.
Oh, I did. *twitch* I'm sure I was mumbling about em's and i's and strong's and b's (oh my!) in my sleep to the annoyance of my cats
CubGeek is offline   Reply With Quote
Old 08-19-2022, 02:09 PM   #731
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by CubGeek View Post
Cheers, that'll help significantly. Luckily, the few things I'm crafting are small enough, and I'm doing them slow enough, that there isn't much "spaghettification" of the code, or the whole <span>ception of nested <span>s thing that I've seen when I peeked inside a couple of my purchased or calibre-converted books.
It usually happens around footnotes and all sorts of other complicated nesting:

Code:
<p class="normal"><span class="normal">This is an <span class="italics">example</span>.<sup><span class="tiny">1</span></sup></span></p>
Let's say you were trying to correct (or remove) that outside <span class="normal">.

Regular Expressions would get completely confused with the 3 different </span>s, where TagMechanic would be able to figure out which </span> connects with which one.

Of course, with clean code, this wouldn't be a problem, but in real life there's always these crazy examples that creep up... and it comes to bite you in the butt later when you already accidentally did a "Replace All" 3 hours ago!

Quote:
Originally Posted by CubGeek View Post
Yup! Note my edit above where I learned about Capture Groups and backreferences and... However, I like your explanation better. Much more user friendly.


You can also use those in FINDs as well!

For example, one of the tricks I use is:

Double Word Check

Find: (\b[a-z]+) (\1\b)
Replace: \1

This grabs a lowercase word + looks for it again:
  • Did you see the reactor reactor?
  • What are you doing in that that area?
  • If only they had had enough power to use the ultrasound machine for each pregnancy, he would have detected the problem earlier and been able to plan the C-section.

How does it work?

It uses a few tricks:
  • \b = a "word boundary". (Beginning of word)
  • [a-z] = lowercase letters 'a' through 'z'.
  • + = ONE OR MORE of previous thing.

Shove all that in GROUP 1.
  • \1 = Look for GROUP 1 again.
  • \b = a "word boundary". (End of word)

Shove all that in GROUP 2.

Now, when you replace, you're only replacing with GROUP 1, meaning that duplicated word never makes it:
  • Did you see the reactor?
  • What are you doing in that area?



- - -

Usage Note: You do have to be careful of false positives though, so NEVER do a "Replace All".

Always do a one-by-one check.

There shouldn't ever be too many "doubles" within your book, but they're an extremely common typo that's very hard to catch. (Usually the human brain just skips right over them.)

- - -

Quote:
Originally Posted by CubGeek View Post
Oh, I did. *twitch* I'm sure I was mumbling about em's and i's and strong's and b's (oh my!) in my sleep
Me too. Took me many years to finally get it boiled down.

Glad to see someone benefited from all those in-depth discussions.

Last edited by Tex2002ans; 08-19-2022 at 02:12 PM.
Tex2002ans is offline   Reply With Quote
Old 08-19-2022, 02:25 PM   #732
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,897
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Use <i> and <b> and forget <em> and <strong> ever existed.
JSWolf is offline   Reply With Quote
Old 08-19-2022, 02:35 PM   #733
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Drop it Jon. Your preferences are not really relevant to the conversation at hand.
DiapDealer is offline   Reply With Quote
Old 08-19-2022, 04:28 PM   #734
CubGeek
Connoisseur
CubGeek began at the beginning.
 
Posts: 52
Karma: 10
Join Date: Sep 2021
Location: Upstate NY, USA
Device: iPad Pro, Kindle basic
Quote:
Originally Posted by JSWolf View Post
Use <i> and <b> and forget <em> and <strong> ever existed.
After reading threads that spanned (ha! <span>ned! ) 5+ years, and seeing you spouting the same thing about <i> and <em> and <b> and <strong> (regardless of being educated better), I'll at least give you credit for consistency. But that's all. Thanks for your input.
CubGeek is offline   Reply With Quote
Old 08-19-2022, 04:30 PM   #735
CubGeek
Connoisseur
CubGeek began at the beginning.
 
Posts: 52
Karma: 10
Join Date: Sep 2021
Location: Upstate NY, USA
Device: iPad Pro, Kindle basic
Quote:
Originally Posted by Tex2002ans View Post
Glad to see someone benefited from all those in-depth discussions.
Having spent 5 years working for a boss who was blind and who used a screen-reader, I hope that I now have a better empathy for the difficulties she encountered than before my time there. Not only with official communications, but with webpage navigation, with poorly-implemented accessibility "functions," and also with the simple pleasure of "reading" a book over her lunch break.

So, if my learning how to properly show varying types of emphasis to help convey nuances for someone who's relying on a screen-reader or similar (on the very infinitesimal chance they access something that I put together) then it was time well-spent.
CubGeek is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Examples of Subgroups emonti8384 Lounge 32 02-26-2011 06:00 PM
Accessories Pen examples Gunnerp245 enTourage Archive 15 02-21-2011 03:23 PM
Stylesheet examples? Skitzman69 Sigil 15 09-24-2010 08:24 PM
Examples kafkaesque1978 iRiver Story 1 07-26-2010 03:49 PM
Looking for examples of typos in eBooks Tonycole General Discussions 1 05-05-2010 04:23 AM


All times are GMT -4. The time now is 10:56 PM.


MobileRead.com is a privately owned, operated and funded community.