Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 01-27-2019, 02:19 PM   #1
patrik
Guru
patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.
 
Posts: 682
Karma: 4568205
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
Regex help needed.

I have plenty of similar to the following:

<p nodeindex="67">There is a great episode of Red Dwarf, where the crew fall prey to the “Despair Squid” on an ocean planet while salvaging from a derelict vessel at the bottom of the ocean. The squid squirted the ship with ink that had hallucinogenic properties that created despair in the crew members. They realise that they have been infected and head back to their own ship incredibly quickly to get a mood stabiliser. On the lift back to the top of the ship Dave Lister says “<em nodeindex="318">I don’t seem to have been effected…. Though it is true to say that no-one ever truly loved me in my entire life…</em>.” and starts crying as despair sets in. It is really funny!</p>

... and want to remove all nodeindex="xx" entries and I'm trying and googling all I can about regex.... I usually can figure out things like this after a while but...

Something like nodeindex=".*" matches way to much.

(I'm using Sigil.)
patrik is offline   Reply With Quote
Old 01-27-2019, 02:34 PM   #2
jhowell
Grand Sorcerer
jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.
 
jhowell's Avatar
 
Posts: 7,059
Karma: 91577715
Join Date: Nov 2011
Location: Charlottesville, VA
Device: Kindles
.* is greedy. It will match as many characters as possible.
.*? will match as few characters as possible and will work better here.
[0-9]+ is another alternative if the value is always numeric.
jhowell is offline   Reply With Quote
Old 01-27-2019, 02:58 PM   #3
patrik
Guru
patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.
 
Posts: 682
Karma: 4568205
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
Ah, nodeindex="[0-9]+" worked perfectly!

Thank you very much!!


The regex-sites I find are all very complex. Isn't there some Regex for dummies out there?
patrik is offline   Reply With Quote
Old 01-28-2019, 05:01 AM   #4
elchamaco
Zealot
elchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enough
 
Posts: 128
Karma: 500
Join Date: Aug 2011
Device: kindle, boox
You can use also use \d+ to find any number \d for numbers and the + for following numbers

<p nodeindex="\d+">

and replace with <p>
elchamaco is offline   Reply With Quote
Old 01-28-2019, 05:16 AM   #5
patrik
Guru
patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.
 
Posts: 682
Karma: 4568205
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
Ok, thank you.
patrik is offline   Reply With Quote
Old 01-28-2019, 08:13 AM   #6
patrik
Guru
patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.
 
Posts: 682
Karma: 4568205
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
If I may...

I want to remove a beginning div.


It's like this:

<body>
<div lang="en">

(bunch of stuff)

</div>
</body>

I thought something like <div lang="en">.*</div> would match.
patrik is offline   Reply With Quote
Old 01-28-2019, 12:44 PM   #7
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 45,739
Karma: 168959600
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by patrik View Post
If I may...

I want to remove a beginning div.


It's like this:

<body>
<div lang="en">

(bunch of stuff)

</div>
</body>

I thought something like <div lang="en">.*</div> would match.
You might want to make sure that you match the last </div> in the file. So perhaps <body> <div lang="en">(.*?)</div> </body>
and replace with
<body>\1</body>

As usual, make sure you have a backup before making changes.
DNSB is offline   Reply With Quote
Old 01-28-2019, 01:15 PM   #8
patrik
Guru
patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.
 
Posts: 682
Karma: 4568205
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
Thanks, that's along what I thought would work, but it doesn't match anything.

This is a bit frustrating....
patrik is offline   Reply With Quote
Old 01-28-2019, 01:45 PM   #9
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by patrik View Post
Ah, nodeindex="[0-9]+" worked perfectly!

Thank you very much!!


The regex-sites I find are all very complex. Isn't there some Regex for dummies out there?
You could check our wiki Regular expressions

Dale
DaleDe is offline   Reply With Quote
Old 01-28-2019, 02:01 PM   #10
patrik
Guru
patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.
 
Posts: 682
Karma: 4568205
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
Quote:
Originally Posted by DaleDe View Post
You could check our wiki Regular expressions

Dale
Thankyou.

Reading that got me thinking... could it be that it not matching is due to new line not being included in '.*' ?

Ok, I tried with <div lang="en">.*|\n*</div> which actually seems to find/match.

But when I add ( ) to be able to use \1, it doesn't match anymore.

Am I doing this wrong: <div lang="en">(.*|\n*)</div> ?
patrik is offline   Reply With Quote
Old 01-28-2019, 02:34 PM   #11
patrik
Guru
patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.
 
Posts: 682
Karma: 4568205
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
Hey, enabling DotAll in Sigil made this work: <div lang="en">(.*)</div>

Do I understand correct that . now also includes new line and that's why it works?
patrik is offline   Reply With Quote
Old 01-28-2019, 05:40 PM   #12
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 45,739
Karma: 168959600
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by patrik View Post
Thanks, that's along what I thought would work, but it doesn't match anything.

This is a bit frustrating....
Did you copy/paste the first & last bits? In your example, copy/pasting:
Code:
<body>
<div lang="en">
would keep the EOL between the > and <. Ditto for the:
Code:
</div>
</body>
The search line would show a space between the > and < where the EOL was. In the sample ebook I used for testing, the find showed as
Code:
<body> <div class="_override_2">(.*?)</div> </body>
with the replace showing as:
Code:
<body>\1</body>
I have DotAll, Auto-Tokenize and Wrap checked in the options.

Last edited by DNSB; 01-28-2019 at 05:51 PM.
DNSB is offline   Reply With Quote
Old 02-03-2019, 05:58 PM   #13
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by patrik View Post
The regex-sites I find are all very complex. Isn't there some Regex for dummies out there?
Regular-expressions.info is the go-to place for learning Regular Expressions.

"For dummies", the "Regex Examples" thread in Sigil's forums is good to see a lot of real life examples. Lots of people have asked questions over the years, and people have answered.

I've written multiple posts over the years breaking down Regex into bite-sized chunks, and I try to explain "in normal English" like:

Or sometimes color coding:

But really, there's not much you can do besides learning the basics. You then just build them up piece-by-piece, and as long as you know what each little piece does, you can tackle the more complicated patterns you recognize.

Last year, I was training a few people in the basics of ebook creation. I created a presentation, and Regex was one of the sections (10 slides).

I showed a handful of common issues that creep up like:
  • How to find Page Numbers (p. 123, Page 234)
  • Finding 3 capital letters in a row (ABCs, FBI, XYZ)
  • Finding lowercase letter in beginning of paragraph
  • Correcting spaces around em dashes
  • Finding all the "1 a.m." and "9:32 p.m."
  • Finding roman numeral chapters (Chapter III)
  • Paragraphs that end with no punctuation
  • Finding/Correcting punctuation OUTSIDE of quotation marks
  • Replacing 98-99 (hyphen) or 98—99 (em dash) with the proper en dash
  • [...]

If anyone is interested in those slides, send me a PM. :P

Last edited by Tex2002ans; 02-03-2019 at 10:53 PM.
Tex2002ans is offline   Reply With Quote
Old 02-04-2019, 12:19 PM   #14
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Sounds like a good addition to our wiki.

Dale
DaleDe is offline   Reply With Quote
Old 02-04-2019, 12:53 PM   #15
patrik
Guru
patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.
 
Posts: 682
Karma: 4568205
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
Tex, this forum has some really exellent contributers, and you sure are one of them. Thank you!

Will send you a pm.
patrik is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex Help Needed gstock97 Library Management 1 09-01-2015 07:13 PM
Help with regex needed. kamanza Library Management 3 01-24-2012 07:27 AM
RegEx Help needed ghostyjack Sigil 14 11-02-2011 10:22 AM
RegEx REPLACEMENT: Help needed! LARdT Sigil 12 01-04-2010 07:25 PM
Regex help needed gandor62 Calibre 2 11-04-2009 10:27 AM


All times are GMT -4. The time now is 07:14 PM.


MobileRead.com is a privately owned, operated and funded community.