![]() |
#1 |
Zealot
![]() ![]() Posts: 146
Karma: 194
Join Date: Jun 2010
Location: Melbourne
Device: iPad
|
Regex to search at beginning of line
HI all.
I search for this type of code: Code:
<p class="list1"><span class="list1">2.</span>Mathematics disorder (315.1)</p> Code:
<p class="(list.+)"><span class="(list.*)">(.+)</span>(.+)</p> Code:
<div class="\1"><p class="\1"><span class="\1">\3</span></p><p class="\1">\4</p></div> I tried using the "^" in front of my search string, but this didn't stop it from finding the same p tag inside the div tag. Any ideas? Thanks in advance. |
![]() |
![]() |
![]() |
#2 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,762
Karma: 8700631
Join Date: Mar 2013
Location: Rosario - Santa Fe - Argentina
Device: Kindle 4 NT
|
Quote:
In the "Find" field, try using the following chain: Code:
<p class="list(.+)"><span class="list(.*)">(.+)</span>(.+)</p> Code:
<div class="list\1"><p class="list\1"><span class="list\1">\3</span></p><p class="list\1">\4</p></div> But it makes little sense to use the same class for the <p> and <span> tags because both of them will apply the same format. ![]() Code:
<div class="list\1"><p class="list\1">\3</p><p class="list\1">\4</p></div> Rubén |
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Zealot
![]() ![]() Posts: 146
Karma: 194
Join Date: Jun 2010
Location: Melbourne
Device: iPad
|
Thanks for the info.
I understand what you are telling me to do but I am trying to understand the Regex syntax. Why doesn't the ^ tell the find to begin at the beginning of the line? |
![]() |
![]() |
![]() |
#4 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,568
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
Code:
(?m)^<p class="(list.+)"><span class="(list.*)">(.+)</span>(.+)</p> Code:
(?<!\>)<p class="(list.+)"><span class="(list.*)">(.+)</span>(.+)</p> I highly recommend using something similar to the second method, since the common practice of indenting html code--for readability--could easily mess with your idea of what might constitute "the beginning" of a line. Last edited by DiapDealer; 05-18-2013 at 11:48 AM. |
|
![]() |
![]() |
![]() |
#5 |
Zealot
![]() ![]() Posts: 146
Karma: 194
Join Date: Jun 2010
Location: Melbourne
Device: iPad
|
THANK you so much!
I'll give this a try tomorrow. It is now 1.30 am so I have to sleep! Thanks again for the details. I really appreciate your time in helping me. Keep safe, |
![]() |
![]() |
Advert | |
|
![]() |
#6 | |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
Try to use the least 'greedy' pattern match, rather than trap with a big net. Code:
(list\d+)
|
|
![]() |
![]() |
![]() |
#7 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,568
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
An excellent point.
|
![]() |
![]() |
![]() |
#8 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 697
Karma: 150000
Join Date: Feb 2010
Device: none
|
It could also be that "pretty print" is adding whitespace to the beginning of the line (e.g. tab?).
I did a quick experiment, and while Code:
find: ^<p class= Code:
find: ^\W*<p class= |
![]() |
![]() |
![]() |
#9 |
Zealot
![]() ![]() Posts: 146
Karma: 194
Join Date: Jun 2010
Location: Melbourne
Device: iPad
|
Thank you all for the information.
I did the (?m)^ and it worked. It didn't find the p tag inside the div tag. When I did remove the div tag from the front and the end of the para, I searched again but it did not find the p tag in its original format. I took note of what st_albert said and Voila! it worked. I assume the \W means white space? Is there a nice succinct cheat sheet somewhere I can download that has all the codes used by Sigil. I have seen this page: http://docs.python.org/2/library/re.html but it is a bit daunting for a newbie! Thanks again to all of you for your help! |
![]() |
![]() |
![]() |
#10 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,568
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
The whitespace preceding indented lines of html is exactly why I suggested using a negative lookbehind assertion instead of the ^ metacharacter.
|
![]() |
![]() |
![]() |
#11 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
![]() look for the download PDF button The site has other useful cheat sheets (css2, html) |
![]() |
![]() |
![]() |
#12 |
Zealot
![]() ![]() Posts: 146
Karma: 194
Join Date: Jun 2010
Location: Melbourne
Device: iPad
|
Thanks so much!
I remember seeing this some time back but never kept a bookmark for it. Thank you! |
![]() |
![]() |
![]() |
#13 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,055
Karma: 11391181
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
I had put this question into an old thread, so nobody read it. Please don't mind me repeating it here:
"I have some ebooks where, in the current of the text, appear page numbers (probably referring to the original printed version), sometimes even with hyperlink referring to the original TOC. I would like to delete them, but have no clue on regex matters. In one particular book, the numbers appear in squared brackets, such as [Pg 4]. Those numbers have up to three digits. The tags are like this: <span class="pagenum"><a class="pcalibre pcalibre1" id="Page_4">[Pg 4]</a></span>. Is there a way of removing them by one single regex command in Sigil or Calibre?" By the way: It would be very nice, if there were a feature in Sigil to do such operations without necessity of beeing a Regex crack (I tried a lot of things, but nothing that worked only approximately). No idea if there are technical possibilities, but I think many users would be happy to be able to manipulate theír books in an easier way. |
![]() |
![]() |
![]() |
#14 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
If each of the pagenums have the <span class="pagenum"> and have no further embedded spans, then a simple regex would be
Code:
<span class="pagenum".+/span> you may need to insert a questionmark after the plus sign, to make it less greedy |
![]() |
![]() |
![]() |
#15 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,055
Karma: 11391181
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Ah, thanks! I always thought that it must me quite simple. I had inserted a d+ after the "pagenum". That lead - I don't know why - to nothing. And I thought that I would have to include - anyhow - the [Pg 4] expression.
Well, I'll try it. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regex search and replace | dwlamb | Sigil | 6 | 04-12-2013 02:34 PM |
Regex Problem / Line that does't end with .</p> | mcam77 | Sigil | 6 | 03-25-2013 06:38 PM |
how do I span more than one line with regex | BartB | Sigil | 3 | 12-11-2011 05:12 PM |
Importing RegEx Line | TheEldest | Calibre | 1 | 07-05-2011 10:18 PM |
Insert new line with regex | deckoff | Sigil | 6 | 08-08-2010 11:24 AM |