![]() |
#1 | |
Village idiot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 157
Karma: 519566
Join Date: Mar 2014
Location: Belgium
Device: sony PRS T-1
|
REGEX can't seem to figure it out
Hi all
Came about this old post: Quote:
I thought this could save me a lot of time cleaning up an epub document. Didn't know about regex, so I gave it a try. I want to replace <i class="calibre2">words</i> by <em>words</em) So I searched for <i class="calibre2">(\w+)</i> and replaced it with <em>\1</em> That worked fine when there was only one word between > <. But when there are more: <i class="calibre2">two words</i>, my seach doesn't find them. Does w+ only search 1 word? How do I get it to search for any number of words? Do I change the \1 in the replace with * or something else? Thanks in advance! JLius |
|
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
That would go wrong if there are digits in the original tags. Just a question, why do you want to replace <i> by <em>? Both work perfectly fine and although <i> is officially deprecated, I don't think there is a change it will be gone for long. Not only that, there is a distinct difference between <i> and <em>. The first is a stylish indication, the second one semantic. It might be that emphasis is the wrong semantic value for the <i>.
Anyway, back to your question. If you want to retain the <i> you can just replace <i class="calibre2"> by <i>. If not, you can try to do the following: Search: <i class="calibre2">(.*?)</i> Replace: <em>\1</em> Explanation: the . selects any character. The * says 0 or more and the question mark says 0 or 1. Regex can be confusing in the beginning, but is worth the time you need to invest. There are various helpers out there. Some here have good results with RegEx Buddy. You can also use a cheatsheet like this one: http://www.cheatography.com/davechil...r-expressions/ |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,352
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
\w does not mean match a "word." So if follows that \w+ does not mean "one or more words."
\w is a shortcut for a word character. Singular. Which (most commonly) translates to [A-Za-z0-9_]. A space is not a word character--which is why your regex stopped matching when it encountered the space between words. There really is no regex shortcut for matching a "word." You can find the boundaries between words, though. Some regex flavors (calibre's for example) can match the beginnings or ends of words. Toxaris's suggestion should do what you want (unless there are any complicated, nested i tags--which isn't very likely). You could also search for: Code:
<(/?)i( class=".*?")?> Code:
<\1em> Last edited by DiapDealer; 03-24-2014 at 08:22 AM. |
![]() |
![]() |
![]() |
#4 |
Village idiot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 157
Karma: 519566
Join Date: Mar 2014
Location: Belgium
Device: sony PRS T-1
|
Super, many thanks you both!
|
![]() |
![]() |
![]() |
#5 |
Village idiot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 157
Karma: 519566
Join Date: Mar 2014
Location: Belgium
Device: sony PRS T-1
|
By the way, for others that might find their way to this tread, Toxaris: your suggestion didn't work, for reasons I don't understand the <em> tags got all over my paragraphs in places they don't belong.
Diapdealer: as far as I can tell, your suggestion worked. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
That should not have happened, unless the structure of the document was not in order or that there were nested tags which is very unlikely. With what program did you execute the RegEx?
|
![]() |
![]() |
![]() |
#7 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,352
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I agree. Your regex should have worked, Toxaris. I guess it could get a bit greedy depending on the flags/checkboxes that were set by default ... but then so could mine, for that matter.
|
![]() |
![]() |
![]() |
#8 |
Color me gone
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
It can be helpful sometimes to do things in stages, taking out an element or two at a time. It can make the search expression less complicated.
It is also helpful to set up the search expression and repeatedly find without replacing to make sure it will not go berserk. In Sigil finding what you didn't intend can be catastrophic unless you do a save just before your replace all. It can be hard for any normal human to envision all possible things any search will grab, even for just plain characters. In my current book, I have an ATIS. In fiddling with it, I turned satisfaction into s ATIS faction. |
![]() |
![]() |
![]() |
#9 |
Village idiot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 157
Karma: 519566
Join Date: Mar 2014
Location: Belgium
Device: sony PRS T-1
|
I used calibre.
Don't know what nested tags are, but I'll give an example: <p class="calibre3"> text here <i class="calibre2"> text here </i> text here </p> Toxaris' regex did something like this to various paragraphs: <p></em>some text that didn't need em <em>text that needed the em</em>regular text <em> </p> Didn't make sense to me. Maybe my first <i class="calibre2">(\w+)</i> had something to do with it. |
![]() |
![]() |
![]() |
#10 |
Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 629
Karma: 1623086
Join Date: Jan 2012
Device: iPad, iPhone, Nook Simple Touch
|
For a gentle introduction to regular expressions:
https://developer.apple.com/library/...nfettered.html |
![]() |
![]() |
![]() |
#11 | |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
It is more general (as opposed to being focused on usage within perl/sed/grep/awk) as well as being heavily geared towards the beginner, with color-coded examples. Last edited by eschwartz; 03-25-2014 at 12:29 AM. |
|
![]() |
![]() |
![]() |
#12 | |
Bookmaker & Cat Slave
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
Hitch |
|
![]() |
![]() |
![]() |
#13 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
|
![]() |
![]() |
![]() |
#14 |
Bookmaker & Cat Slave
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Will I ever figure this all out? | Stargehzer | Introduce Yourself | 4 | 01-23-2012 05:21 AM |
Can't figure out dropbox | 49Kat | Kobo Tablets | 17 | 11-27-2011 11:45 PM |
Can't figure it out. | mwhiteway | Introduce Yourself | 9 | 09-02-2011 04:19 AM |
Can't figure this out, HELP! | chilady1 | 1 | 09-26-2010 12:13 AM | |
only 3 weeks to figure this out... | undertodetsy | Sony Reader | 10 | 02-01-2008 09:32 AM |