Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 05-18-2013, 05:45 AM   #1
kakkalla
Zealot
kakkalla doesn't litterkakkalla doesn't litter
 
kakkalla's Avatar
 
Posts: 146
Karma: 194
Join Date: Jun 2010
Location: Melbourne
Device: iPad
Regex to search at beginning of line

HI all.

I search for this type of code:
Code:
<p class="list1"><span class="list1">2.</span>Mathematics disorder (315.1)</p>
In the Find panel, I have:
Code:
<p class="(list.+)"><span class="(list.*)">(.+)</span>(.+)</p>
and the replace field I have:
Code:
<div class="\1"><p class="\1"><span class="\1">\3</span></p><p class="\1">\4</p></div>
As a test, I went back before the first replacement and ran the "find" again. It found the same paragraph inside the div tag, which is not what I want.

I tried using the "^" in front of my search string, but this didn't stop it from finding the same p tag inside the div tag.

Any ideas? Thanks in advance.
kakkalla is offline   Reply With Quote
Old 05-18-2013, 10:21 AM   #2
RbnJrg
Wizard
RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.
 
Posts: 1,762
Karma: 8700631
Join Date: Mar 2013
Location: Rosario - Santa Fe - Argentina
Device: Kindle 4 NT
Quote:
Originally Posted by kakkalla View Post
HI all.

I search for this type of code:
Code:
<p class="list1"><span class="list1">2.</span>Mathematics disorder (315.1)</p>
In the Find panel, I have:
Code:
<p class="(list.+)"><span class="(list.*)">(.+)</span>(.+)</p>
and the replace field I have:
Code:
<div class="\1"><p class="\1"><span class="\1">\3</span></p><p class="\1">\4</p></div>
As a test, I went back before the first replacement and ran the "find" again. It found the same paragraph inside the div tag, which is not what I want.

I tried using the "^" in front of my search string, but this didn't stop it from finding the same p tag inside the div tag.

Any ideas? Thanks in advance.
Hi kakkalla;

In the "Find" field, try using the following chain:

Code:
<p class="list(.+)"><span class="list(.*)">(.+)</span>(.+)</p>
And in the "Replace" field, use:

Code:
<div class="list\1"><p class="list\1"><span class="list\1">\3</span></p><p class="list\1">\4</p></div>

But it makes little sense to use the same class for the <p> and <span> tags because both of them will apply the same format. In that case, the <span> code is not necessary (is redundant). You could use in the "Replace" field the following chain:

Code:
<div class="list\1"><p class="list\1">\3</p><p class="list\1">\4</p></div>
Regards
Rubén
RbnJrg is offline   Reply With Quote
Advert
Old 05-18-2013, 10:26 AM   #3
kakkalla
Zealot
kakkalla doesn't litterkakkalla doesn't litter
 
kakkalla's Avatar
 
Posts: 146
Karma: 194
Join Date: Jun 2010
Location: Melbourne
Device: iPad
Thanks for the info.

I understand what you are telling me to do but I am trying to understand the Regex syntax. Why doesn't the ^ tell the find to begin at the beginning of the line?
kakkalla is offline   Reply With Quote
Old 05-18-2013, 11:24 AM   #4
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,568
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by kakkalla View Post
Why doesn't the ^ tell the find to begin at the beginning of the line?
By default, PCRE treats the search subject as one string. Meaning that for all intents and purposes, the contents of your html file is one, big string. Hence the ^ and $ metacharacters will only match the very beginning and the very end of that one, big string respectively. Putting PCRE in multi-line mode by prefacing your expression with (?m) will then use ^ and $ to match the beginning and the ends of lines in the manner you're expecting it to.
Code:
(?m)^<p class="(list.+)"><span class="(list.*)">(.+)</span>(.+)</p>
You could also forget multi-line mode (and ^ or $ entirely) and use a negative lookbehind to achieve a similar effect:
Code:
(?<!\>)<p class="(list.+)"><span class="(list.*)">(.+)</span>(.+)</p>
Meaning don't match the pattern if it's preceded by the closing '>' of another tag.

I highly recommend using something similar to the second method, since the common practice of indenting html code--for readability--could easily mess with your idea of what might constitute "the beginning" of a line.

Last edited by DiapDealer; 05-18-2013 at 11:48 AM.
DiapDealer is offline   Reply With Quote
Old 05-18-2013, 11:27 AM   #5
kakkalla
Zealot
kakkalla doesn't litterkakkalla doesn't litter
 
kakkalla's Avatar
 
Posts: 146
Karma: 194
Join Date: Jun 2010
Location: Melbourne
Device: iPad
THANK you so much!

I'll give this a try tomorrow. It is now 1.30 am so I have to sleep!

Thanks again for the details. I really appreciate your time in helping me.

Keep safe,
kakkalla is offline   Reply With Quote
Advert
Old 05-18-2013, 11:48 AM   #6
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by kakkalla View Post
THANK you so much!

I'll give this a try tomorrow. It is now 1.30 am so I have to sleep!

Thanks again for the details. I really appreciate your time in helping me.

Keep safe,
My 2 cents:
Try to use the least 'greedy' pattern match, rather than trap with a big net.

Code:
(list\d+)
You want 'list'+digits only. Not 'listing' or some other selector that starts with 'list'
theducks is offline   Reply With Quote
Old 05-18-2013, 11:51 AM   #7
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,568
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
An excellent point.
DiapDealer is offline   Reply With Quote
Old 05-18-2013, 01:41 PM   #8
st_albert
Guru
st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'
 
Posts: 697
Karma: 150000
Join Date: Feb 2010
Device: none
It could also be that "pretty print" is adding whitespace to the beginning of the line (e.g. tab?).

I did a quick experiment, and while
Code:
find:
^<p class=
did not find what I wanted, this one did:
Code:
find:
^\W*<p class=
( it also includes the previous end-of-line, though)
st_albert is offline   Reply With Quote
Old 05-18-2013, 10:49 PM   #9
kakkalla
Zealot
kakkalla doesn't litterkakkalla doesn't litter
 
kakkalla's Avatar
 
Posts: 146
Karma: 194
Join Date: Jun 2010
Location: Melbourne
Device: iPad
Thank you all for the information.

I did the (?m)^ and it worked. It didn't find the p tag inside the div tag. When I did remove the div tag from the front and the end of the para, I searched again but it did not find the p tag in its original format.

I took note of what st_albert said and Voila! it worked.

I assume the \W means white space?

Is there a nice succinct cheat sheet somewhere I can download that has all the codes used by Sigil.

I have seen this page: http://docs.python.org/2/library/re.html

but it is a bit daunting for a newbie!

Thanks again to all of you for your help!
kakkalla is offline   Reply With Quote
Old 05-18-2013, 11:07 PM   #10
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,568
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
The whitespace preceding indented lines of html is exactly why I suggested using a negative lookbehind assertion instead of the ^ metacharacter.
DiapDealer is offline   Reply With Quote
Old 05-19-2013, 11:00 AM   #11
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
'cheatography regex'

look for the download PDF button
The site has other useful cheat sheets (css2, html)
theducks is offline   Reply With Quote
Old 05-19-2013, 11:02 AM   #12
kakkalla
Zealot
kakkalla doesn't litterkakkalla doesn't litter
 
kakkalla's Avatar
 
Posts: 146
Karma: 194
Join Date: Jun 2010
Location: Melbourne
Device: iPad
Thanks so much!

I remember seeing this some time back but never kept a bookmark for it. Thank you!
kakkalla is offline   Reply With Quote
Old 05-31-2013, 04:20 AM   #13
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,055
Karma: 11391181
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
I had put this question into an old thread, so nobody read it. Please don't mind me repeating it here:
"I have some ebooks where, in the current of the text, appear page numbers (probably referring to the original printed version), sometimes even with hyperlink referring to the original TOC. I would like to delete them, but have no clue on regex matters. In one particular book, the numbers appear in squared brackets, such as [Pg 4]. Those numbers have up to three digits. The tags are like this: <span class="pagenum"><a class="pcalibre pcalibre1" id="Page_4">[Pg 4]</a></span>. Is there a way of removing them by one single regex command in Sigil or Calibre?"

By the way: It would be very nice, if there were a feature in Sigil to do such operations without necessity of beeing a Regex crack (I tried a lot of things, but nothing that worked only approximately). No idea if there are technical possibilities, but I think many users would be happy to be able to manipulate theír books in an easier way.
Leonatus is offline   Reply With Quote
Old 05-31-2013, 04:58 AM   #14
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
If each of the pagenums have the <span class="pagenum"> and have no further embedded spans, then a simple regex would be
Code:
<span class="pagenum".+/span>
replace with nothing

you may need to insert a questionmark after the plus sign, to make it less greedy
Perkin is offline   Reply With Quote
Old 05-31-2013, 08:03 AM   #15
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,055
Karma: 11391181
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
Ah, thanks! I always thought that it must me quite simple. I had inserted a d+ after the "pagenum". That lead - I don't know why - to nothing. And I thought that I would have to include - anyhow - the [Pg 4] expression.
Well, I'll try it.
Leonatus is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex search and replace dwlamb Sigil 6 04-12-2013 02:34 PM
Regex Problem / Line that does't end with .</p> mcam77 Sigil 6 03-25-2013 06:38 PM
how do I span more than one line with regex BartB Sigil 3 12-11-2011 05:12 PM
Importing RegEx Line TheEldest Calibre 1 07-05-2011 10:18 PM
Insert new line with regex deckoff Sigil 6 08-08-2010 11:24 AM


All times are GMT -4. The time now is 11:59 AM.


MobileRead.com is a privately owned, operated and funded community.