Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 03-18-2012, 07:20 AM   #1
ghostyjack
Guru
ghostyjack ought to be getting tired of karma fortunes by now.ghostyjack ought to be getting tired of karma fortunes by now.ghostyjack ought to be getting tired of karma fortunes by now.ghostyjack ought to be getting tired of karma fortunes by now.ghostyjack ought to be getting tired of karma fortunes by now.ghostyjack ought to be getting tired of karma fortunes by now.ghostyjack ought to be getting tired of karma fortunes by now.ghostyjack ought to be getting tired of karma fortunes by now.ghostyjack ought to be getting tired of karma fortunes by now.ghostyjack ought to be getting tired of karma fortunes by now.ghostyjack ought to be getting tired of karma fortunes by now.
 
ghostyjack's Avatar
 
Posts: 718
Karma: 1085610
Join Date: Mar 2009
Location: Bristol, England
Device: PRS-T1, 1825PT, Galaxy Tab, One X, TF700T, Aura HD, Nexus 7
RegEx Help

I've got a load of ePubs that have their text loaded with tags written as:
Code:
<a id="p2"></a>
Where "p" followed by the number represents the original page number of the book if came from. This number ranges from single digits to into the thousands.

I'd like to remove them but unfortunately I can't figure out the RegEx command for this, previously in sigil I would have used the wildcard mode, but now with it removed, my only recourse is RegEx.

Any idea on what to use?

Also looking in the toc.ncx there is a pagelist lection referencing all those tags. So if I do remove all the those tags, can I delete the pagelist section from the toc.ncx file?
ghostyjack is offline   Reply With Quote
Old 03-18-2012, 08:31 AM   #2
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,515
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Something like "p\d+" for the page numbers (\d = any digit, + = 1 or more times)?

You not only can remove the pagelist, but must, if you have removed all the referenced anchors in the text.
Jellby is offline   Reply With Quote
Advert
Old 03-22-2012, 01:30 AM   #3
Faster
Connoisseur
Faster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of light
 
Posts: 61
Karma: 12096
Join Date: Sep 2010
Location: Tasmania
Device: Sony PRS 650
Find: <a[^>]*>
Repl: blank
Sigil takes care of the </a>
Faster is offline   Reply With Quote
Old 03-22-2012, 08:08 AM   #4
SBT
Fanatic
SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.
 
SBT's Avatar
 
Posts: 580
Karma: 810184
Join Date: Sep 2010
Location: Norway
Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad
find
<a id="p\d\+"><\/a>
(Depends a bit on which regex flavour you use; you might have to remove a blackslash or two)
Faster's solution should work as well, but it will remove all anchors.
SBT is offline   Reply With Quote
Old 03-22-2012, 09:24 AM   #5
ghostyjack
Guru
ghostyjack ought to be getting tired of karma fortunes by now.ghostyjack ought to be getting tired of karma fortunes by now.ghostyjack ought to be getting tired of karma fortunes by now.ghostyjack ought to be getting tired of karma fortunes by now.ghostyjack ought to be getting tired of karma fortunes by now.ghostyjack ought to be getting tired of karma fortunes by now.ghostyjack ought to be getting tired of karma fortunes by now.ghostyjack ought to be getting tired of karma fortunes by now.ghostyjack ought to be getting tired of karma fortunes by now.ghostyjack ought to be getting tired of karma fortunes by now.ghostyjack ought to be getting tired of karma fortunes by now.
 
ghostyjack's Avatar
 
Posts: 718
Karma: 1085610
Join Date: Mar 2009
Location: Bristol, England
Device: PRS-T1, 1825PT, Galaxy Tab, One X, TF700T, Aura HD, Nexus 7
Thanks for all the assistance on this, I'll be having a go at it tonight.

BTW, I'll be doing this in Sigil, so the RegEx engine will be PCRE.
ghostyjack is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex Gunnerp245 Conversion 5 03-05-2012 04:15 PM
New help with a regex txckie Calibre 2 08-29-2011 08:46 PM
Help me with regex please. eVrajka Library Management 5 08-15-2011 12:17 PM
regex help please thevoiceofcheese Calibre 2 08-01-2011 11:27 PM
Regex Faster Sigil 2 04-24-2011 09:08 PM


All times are GMT -4. The time now is 10:59 PM.


MobileRead.com is a privately owned, operated and funded community.