Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 12-05-2011, 02:43 PM   #1
ElMiko
Fanatic
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 502
Karma: 65460
Join Date: Jun 2011
Device: Kindle
Reg-ex help...?

Hi, all. I'm back for some additional reg-ex instruction...

I'm trying to remove <div> tags from a document in bulk, but can't seem to figure out what expression I should be using to find them.

Here's a sample of the code I'm working on:

Spoiler:
<div class="calibre1">
<p class="calibre2"><span class="none">Some text that I want to keep.</span></p>
</div>

<div class="calibre1">
<p class="calibre5"><span class="none1">Some DIFFERENT text that I want to keep.</span></p>
</div>


Now, the expression I used (with the intent of replacing it with "\1") was:
Code:
<div class="calibre1">([^<]*)</div>
Naturally, it didn't work ("no results found"). I think I discovered the reason. the "<" in "([^<]*)" is triggered by the first instance of that character, i.e. the "<" in "<p class...". The bad news is that knowing the problem hasn't helped me find the solution. I've tried a bunch of other iterations that either match too much (the entire document) or nothing at all.

Can someone let me know where exactly my brain is letting me down?
ElMiko is offline   Reply With Quote
Old 12-05-2011, 03:26 PM   #2
Jabby
Jr. - Junior Member
Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.
 
Posts: 586
Karma: 2000358
Join Date: Aug 2010
Location: Alabama
Device: Archos, Asus, HP, Lenovo, Nexus and Samsung tablets in 7,8 and 10"
Find: <div class="calibre1">(.*)</div>
Replace: \1

Should do it.

Regard- John
Jabby is offline   Reply With Quote
Old 12-05-2011, 03:28 PM   #3
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
If you want to get rid of all divs, its easy enough to just delete all div tags. This is most likely fine for most uses, since paragraphs and such dont need to be contained in a div, but be careful if it causes title/forward pages to get a bit strange if there's lots of silly CSS.

Anyway, just use:
Code:
</?div\b[^<>]*>
And replace with nothing.
Serpentine is offline   Reply With Quote
Old 12-05-2011, 03:42 PM   #4
ElMiko
Fanatic
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 502
Karma: 65460
Join Date: Jun 2011
Device: Kindle
Thanks, guys!

@Jabby - Unfortunately, I tried that one, too. It just selects the entire document! Yikes!

@Serpentine - You say it's easy enough to just delete all the div tags. And I noticed that that reg-ex will do just that, but did you mean there's an easy way to delete all div tags without writing reg-ex? As always, if i could impose on you to explain part of your code, too, I'd be most grateful. Specifically: "\b[^<>]*". Thanks
ElMiko is offline   Reply With Quote
Old 12-05-2011, 04:13 PM   #5
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
Quote:
Originally Posted by ElMiko View Post
did you mean there's an easy way to delete all div tags without writing reg-ex? As always, if i could impose on you to explain part of your code, too, I'd be most grateful. Specifically: "\b[^<>]*". Thanks
Nope, there's no direct XML manipulation like that in Sigil - I just meant that rather than replacing the div tags with their content, just deleting the tags themselves is easier.

\b[^<>]* is just a 'nice' ways of dealing with tags and attributes.
\b matches either end of a word. A word is just anything that matches \w+ generally.
The \b stops matches where there's only a partial tag, it's a habit from when you are searching for something like a <p> tag, you need to be careful to avoid <pre> tags.
Code:
<p([^<>]*)> // will match both <p yup="1"> and <pre something="wat">
<p\b[^<>]*> // will match p's but not pre's.
Using [^<>]*> rather than the more common [^>]*> is a measure to avoid destroying badly formatted tags, it's not a huge problem, but if a closing > has been removed by mistake, this will stop it matching the content and following tag(s).
Code:
Using the sample : <p Some text here</p>
</?p\b[^<>]*> : <p Some text here</p>
</?p\b[^>]*> : <p Some text here</p>
Not a very good example, but with nested tags, you can run into some pretty nasty stuff - can always avoid it by validating tho

Last edited by Serpentine; 12-05-2011 at 04:16 PM. Reason: better example
Serpentine is offline   Reply With Quote
Old 12-05-2011, 04:28 PM   #6
ElMiko
Fanatic
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 502
Karma: 65460
Join Date: Jun 2011
Device: Kindle
@Serpentine - Thank you for your patience. Sincerely. I make a real effort not to ask questions whose answers I can extrapolate from previous answers to previous questions that I've asked. The only way to reliably do that is to understand why those previous answers worked the way they did. When the more experienced users (such as yourself) break down the reg-ex logic, it's truly invaluable to me. So, again: sincerely grateful.
ElMiko is offline   Reply With Quote
Old 12-05-2011, 07:40 PM   #7
st_albert
Guru
st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'
 
Posts: 698
Karma: 150000
Join Date: Feb 2010
Device: none
Quote:
Originally Posted by ElMiko View Post
Thanks, guys!

@Jabby - Unfortunately, I tried that one, too. It just selects the entire document! Yikes!
Did you have the "minimal matching" option checked?
st_albert is offline   Reply With Quote
Old 12-05-2011, 07:44 PM   #8
ElMiko
Fanatic
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 502
Karma: 65460
Join Date: Jun 2011
Device: Kindle
@ st_albert - nope. I'll try that on the next file i'm messing with. I remember looking up what "minimal matching" meant in the Sigil tutorial, but I guess I didn't/don't really understand. Could you explain it to me?
ElMiko is offline   Reply With Quote
Old 12-05-2011, 08:30 PM   #9
st_albert
Guru
st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'
 
Posts: 698
Karma: 150000
Join Date: Feb 2010
Device: none
Quote:
Originally Posted by ElMiko View Post
@ st_albert - nope. I'll try that on the next file i'm messing with. I remember looking up what "minimal matching" meant in the Sigil tutorial, but I guess I didn't/don't really understand. Could you explain it to me?
I'll try. Basically, with "minimal matching", the selected string will be the shortest one possible that matches the search pattern. Without "minimal matching" the string will be the LONGEST one possible that matches the pattern. Since the pattern Jabby proposed,

Code:
find:  <div>(.*)</div>
contains a "match any number of character" part [i.e. .*] without minimal matching, the string will match from the first <div> all the way to the LAST </div>. Whereas with minimal matching enabled, the selection will stop at the FIRST </div> after the <div>.

Probably Serpentine could explain this more succinctly.
st_albert is offline   Reply With Quote
Old 12-05-2011, 08:30 PM   #10
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,241
Karma: 61360164
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by ElMiko View Post
@ st_albert - nope. I'll try that on the next file i'm messing with. I remember looking up what "minimal matching" meant in the Sigil tutorial, but I guess I didn't/don't really understand. Could you explain it to me?
KISS or don't be greedy with the match.
I leave it ticked, only Case gets enabled when I only want the exact case to match
theducks is offline   Reply With Quote
Old 12-05-2011, 10:29 PM   #11
ElMiko
Fanatic
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 502
Karma: 65460
Join Date: Jun 2011
Device: Kindle
iiiiiiiiiiiiiiiiiiiiis

@st_albert - That was perfectly clear. Thank you. I had always simply assumed it was selecting the entire document, but your explanation makes total sense. I'll be sure keep that box checked from now on.

---

For the record, I am also not agree with topicstarter. Topicstarter excessive political opinionation on regular expressions. I am disappoint with topicstarter.

Last edited by ElMiko; 12-06-2011 at 10:19 AM. Reason: changed grammar in title for the sake of consistency
ElMiko is offline   Reply With Quote
Old 12-06-2011, 12:41 AM   #12
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Quote:
Originally Posted by RokkyR View Post
I'm not fully agreed with topicstarter
Very interesting first post. Let me ponder about this...
Toxaris is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Adobe Reg problem on PE Gremalkin enTourage eDGe 5 09-02-2011 03:01 PM
Reg Validate EPUB documents Errors. gsp ePub 3 08-13-2011 05:02 AM
Reg expression for importing Debby Library Management 2 02-17-2011 11:20 AM
eBooks: What to read on which reader? El Reg m-reader News 4 11-23-2009 12:50 PM
Reg reviews iRex DR1000S HarryT News 5 07-24-2009 05:32 PM


All times are GMT -4. The time now is 10:02 AM.


MobileRead.com is a privately owned, operated and funded community.