Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 03-24-2014, 05:44 AM   #1
JLius
Village idiot
JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.
 
JLius's Avatar
 
Posts: 157
Karma: 519566
Join Date: Mar 2014
Location: Belgium
Device: sony PRS T-1
REGEX can't seem to figure it out

Hi all

Came about this old post:
Quote:
Originally Posted by Perkin View Post
If all of the chapters contain exactly the same
Code:
<p class="calibre10"><span class="calibre8 bold calibre11 calibre8">
and then the chapter number followed by
Code:
</span></p>
then search for (regex, minimal)
Code:
<p class="calibre10"><span class="calibre8 bold calibre11 calibre8">(\d+)</span></p>
replace
Code:
<h2>Chapter \1</h2>

I thought this could save me a lot of time cleaning up an epub document. Didn't know about regex, so I gave it a try.

I want to replace <i class="calibre2">words</i> by <em>words</em)
So I searched for <i class="calibre2">(\w+)</i> and replaced it with <em>\1</em>

That worked fine when there was only one word between > <. But when there are more: <i class="calibre2">two words</i>, my seach doesn't find them.

Does w+ only search 1 word? How do I get it to search for any number of words? Do I change the \1 in the replace with * or something else?

Thanks in advance!

JLius
JLius is offline   Reply With Quote
Old 03-24-2014, 07:01 AM   #2
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
That would go wrong if there are digits in the original tags. Just a question, why do you want to replace <i> by <em>? Both work perfectly fine and although <i> is officially deprecated, I don't think there is a change it will be gone for long. Not only that, there is a distinct difference between <i> and <em>. The first is a stylish indication, the second one semantic. It might be that emphasis is the wrong semantic value for the <i>.
Anyway, back to your question.

If you want to retain the <i> you can just replace <i class="calibre2"> by <i>. If not, you can try to do the following:
Search: <i class="calibre2">(.*?)</i>
Replace: <em>\1</em>

Explanation: the . selects any character. The * says 0 or more and the question mark says 0 or 1.

Regex can be confusing in the beginning, but is worth the time you need to invest. There are various helpers out there. Some here have good results with RegEx Buddy. You can also use a cheatsheet like this one: http://www.cheatography.com/davechil...r-expressions/
Toxaris is offline   Reply With Quote
Advert
Old 03-24-2014, 07:49 AM   #3
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,352
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
\w does not mean match a "word." So if follows that \w+ does not mean "one or more words."

\w is a shortcut for a word character. Singular. Which (most commonly) translates to [A-Za-z0-9_]. A space is not a word character--which is why your regex stopped matching when it encountered the space between words. There really is no regex shortcut for matching a "word." You can find the boundaries between words, though. Some regex flavors (calibre's for example) can match the beginnings or ends of words.

Toxaris's suggestion should do what you want (unless there are any complicated, nested i tags--which isn't very likely).

You could also search for:
Code:
<(/?)i( class=".*?")?>
And replace with:
Code:
<\1em>
Which will match/alter the tags while ignoring their contents. You may have to uncheck/check any "minimal matching/greedy" option your particular search engine might have.

Last edited by DiapDealer; 03-24-2014 at 08:22 AM.
DiapDealer is offline   Reply With Quote
Old 03-24-2014, 08:42 AM   #4
JLius
Village idiot
JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.
 
JLius's Avatar
 
Posts: 157
Karma: 519566
Join Date: Mar 2014
Location: Belgium
Device: sony PRS T-1
Super, many thanks you both!
JLius is offline   Reply With Quote
Old 03-24-2014, 10:02 AM   #5
JLius
Village idiot
JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.
 
JLius's Avatar
 
Posts: 157
Karma: 519566
Join Date: Mar 2014
Location: Belgium
Device: sony PRS T-1
By the way, for others that might find their way to this tread, Toxaris: your suggestion didn't work, for reasons I don't understand the <em> tags got all over my paragraphs in places they don't belong.
Diapdealer: as far as I can tell, your suggestion worked.
JLius is offline   Reply With Quote
Advert
Old 03-24-2014, 12:03 PM   #6
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
That should not have happened, unless the structure of the document was not in order or that there were nested tags which is very unlikely. With what program did you execute the RegEx?
Toxaris is offline   Reply With Quote
Old 03-24-2014, 12:16 PM   #7
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,352
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
I agree. Your regex should have worked, Toxaris. I guess it could get a bit greedy depending on the flags/checkboxes that were set by default ... but then so could mine, for that matter.
DiapDealer is offline   Reply With Quote
Old 03-24-2014, 12:47 PM   #8
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
It can be helpful sometimes to do things in stages, taking out an element or two at a time. It can make the search expression less complicated.

It is also helpful to set up the search expression and repeatedly find without replacing to make sure it will not go berserk. In Sigil finding what you didn't intend can be catastrophic unless you do a save just before your replace all.
It can be hard for any normal human to envision all possible things any search will grab, even for just plain characters.

In my current book, I have an ATIS. In fiddling with it, I turned satisfaction into s ATIS faction.
mrmikel is offline   Reply With Quote
Old 03-24-2014, 12:49 PM   #9
JLius
Village idiot
JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.
 
JLius's Avatar
 
Posts: 157
Karma: 519566
Join Date: Mar 2014
Location: Belgium
Device: sony PRS T-1
I used calibre.
Don't know what nested tags are, but I'll give an example:

<p class="calibre3"> text here <i class="calibre2"> text here </i> text here </p>

Toxaris' regex did something like this to various paragraphs:
<p></em>some text that didn't need em <em>text that needed the em</em>regular text <em> </p>

Didn't make sense to me. Maybe my first <i class="calibre2">(\w+)</i> had something to do with it.
JLius is offline   Reply With Quote
Old 03-24-2014, 09:25 PM   #10
dgatwood
Curmudgeon
dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.
 
dgatwood's Avatar
 
Posts: 629
Karma: 1623086
Join Date: Jan 2012
Device: iPad, iPhone, Nook Simple Touch
For a gentle introduction to regular expressions:

https://developer.apple.com/library/...nfettered.html
dgatwood is offline   Reply With Quote
Old 03-25-2014, 12:25 AM   #11
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by dgatwood View Post
For a gentle introduction to regular expressions:

https://developer.apple.com/library/...nfettered.html
I would go with http://www.regular-expressions.info/

It is more general (as opposed to being focused on usage within perl/sed/grep/awk) as well as being heavily geared towards the beginner, with color-coded examples.

Last edited by eschwartz; 03-25-2014 at 12:29 AM.
eschwartz is offline   Reply With Quote
Old 03-25-2014, 05:23 AM   #12
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by eschwartz View Post
I would go with http://www.regular-expressions.info/

It is more general (as opposed to being focused on usage within perl/sed/grep/awk) as well as being heavily geared towards the beginner, with color-coded examples.
And don't forget the Hitch Bible: Regex Buddy. ;-)

Hitch
Hitch is offline   Reply With Quote
Old 03-25-2014, 07:40 AM   #13
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Quote:
Originally Posted by Hitch View Post
And don't forget the Hitch Bible: Regex Buddy. ;-)

Hitch
Already covered you in the beginning Hitch!
Toxaris is offline   Reply With Quote
Old 03-25-2014, 03:35 PM   #14
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by Toxaris View Post
Already covered you in the beginning Hitch!


Sorry, Tox, should have caught that. Well, there you go: recommended twice now. ;-)

Hitch
Hitch is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Will I ever figure this all out? Stargehzer Introduce Yourself 4 01-23-2012 05:21 AM
Can't figure out dropbox 49Kat Kobo Tablets 17 11-27-2011 11:45 PM
Can't figure it out. mwhiteway Introduce Yourself 9 09-02-2011 04:19 AM
Can't figure this out, HELP! chilady1 PDF 1 09-26-2010 12:13 AM
only 3 weeks to figure this out... undertodetsy Sony Reader 10 02-01-2008 09:32 AM


All times are GMT -4. The time now is 02:13 PM.


MobileRead.com is a privately owned, operated and funded community.