![]() |
#1 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 682
Karma: 4568205
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
|
Regex help needed.
I have plenty of similar to the following:
<p nodeindex="67">There is a great episode of Red Dwarf, where the crew fall prey to the “Despair Squid” on an ocean planet while salvaging from a derelict vessel at the bottom of the ocean. The squid squirted the ship with ink that had hallucinogenic properties that created despair in the crew members. They realise that they have been infected and head back to their own ship incredibly quickly to get a mood stabiliser. On the lift back to the top of the ship Dave Lister says “<em nodeindex="318">I don’t seem to have been effected…. Though it is true to say that no-one ever truly loved me in my entire life…</em>.” and starts crying as despair sets in. It is really funny!</p> ... and want to remove all nodeindex="xx" entries and I'm trying and googling all I can about regex.... I usually can figure out things like this after a while but... Something like nodeindex=".*" matches way to much. (I'm using Sigil.) |
![]() |
![]() |
![]() |
#2 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,059
Karma: 91577715
Join Date: Nov 2011
Location: Charlottesville, VA
Device: Kindles
|
.* is greedy. It will match as many characters as possible.
.*? will match as few characters as possible and will work better here. [0-9]+ is another alternative if the value is always numeric. |
![]() |
![]() |
![]() |
#3 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 682
Karma: 4568205
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
|
Ah, nodeindex="[0-9]+" worked perfectly!
Thank you very much!! ![]() The regex-sites I find are all very complex. Isn't there some Regex for dummies out there? |
![]() |
![]() |
![]() |
#4 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() Posts: 128
Karma: 500
Join Date: Aug 2011
Device: kindle, boox
|
You can use also use \d+ to find any number \d for numbers and the + for following numbers
<p nodeindex="\d+"> and replace with <p> |
![]() |
![]() |
![]() |
#5 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 682
Karma: 4568205
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
|
Ok, thank you.
|
![]() |
![]() |
![]() |
#6 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 682
Karma: 4568205
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
|
If I may...
I want to remove a beginning div. It's like this: <body> <div lang="en"> (bunch of stuff) </div> </body> I thought something like <div lang="en">.*</div> would match. |
![]() |
![]() |
![]() |
#7 | |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,739
Karma: 168959600
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
and replace with <body>\1</body> As usual, make sure you have a backup before making changes. |
|
![]() |
![]() |
![]() |
#8 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 682
Karma: 4568205
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
|
Thanks, that's along what I thought would work, but it doesn't match anything.
This is a bit frustrating.... ![]() |
![]() |
![]() |
![]() |
#9 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Quote:
Dale |
|
![]() |
![]() |
![]() |
#10 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 682
Karma: 4568205
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
|
Quote:
Reading that got me thinking... could it be that it not matching is due to new line not being included in '.*' ? Ok, I tried with <div lang="en">.*|\n*</div> which actually seems to find/match. But when I add ( ) to be able to use \1, it doesn't match anymore. Am I doing this wrong: <div lang="en">(.*|\n*)</div> ? |
|
![]() |
![]() |
![]() |
#11 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 682
Karma: 4568205
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
|
Hey, enabling DotAll in Sigil made this work: <div lang="en">(.*)</div>
Do I understand correct that . now also includes new line and that's why it works? |
![]() |
![]() |
![]() |
#12 | |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,739
Karma: 168959600
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
Code:
<body> <div lang="en"> Code:
</div> </body> Code:
<body> <div class="_override_2">(.*?)</div> </body> Code:
<body>\1</body> Last edited by DNSB; 01-28-2019 at 05:51 PM. |
|
![]() |
![]() |
![]() |
#13 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
"For dummies", the "Regex Examples" thread in Sigil's forums is good to see a lot of real life examples. Lots of people have asked questions over the years, and people have answered. I've written multiple posts over the years breaking down Regex into bite-sized chunks, and I try to explain "in normal English" like:
Or sometimes color coding: But really, there's not much you can do besides learning the basics. You then just build them up piece-by-piece, and as long as you know what each little piece does, you can tackle the more complicated patterns you recognize. Last year, I was training a few people in the basics of ebook creation. I created a presentation, and Regex was one of the sections (10 slides). I showed a handful of common issues that creep up like:
If anyone is interested in those slides, send me a PM. :P Last edited by Tex2002ans; 02-03-2019 at 10:53 PM. |
|
![]() |
![]() |
![]() |
#14 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Sounds like a good addition to our wiki.
Dale |
![]() |
![]() |
![]() |
#15 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 682
Karma: 4568205
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
|
Tex, this forum has some really exellent contributers, and you sure are one of them. Thank you!
![]() Will send you a pm. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regex Help Needed | gstock97 | Library Management | 1 | 09-01-2015 07:13 PM |
Help with regex needed. | kamanza | Library Management | 3 | 01-24-2012 07:27 AM |
RegEx Help needed | ghostyjack | Sigil | 14 | 11-02-2011 10:22 AM |
RegEx REPLACEMENT: Help needed! | LARdT | Sigil | 12 | 01-04-2010 07:25 PM |
Regex help needed | gandor62 | Calibre | 2 | 11-04-2009 10:27 AM |