09-28-2012, 02:16 AM | #1 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
help with hard regex please
this is a regex question - not really a sigil one, but since this is where the helpful regex experts are, I'm hoping one will step in and advise
I want to clean up some .xml files & am thinking regex in notepad++ could do the job if I can figure out the expression. I want to zap all chunks that look like the example below, where the key search term is X-Fi. for anything with X-Fi in that 2nd line I want to delete the whole block- ambitious I know but can that be done ? <RemoteButton> <Name>X-Fi 24-bit Wheel Button</Name> <MidiSignal>0A 41 44 09 76</MidiSignal> <USBSignal>02 C1 44 89 76</USBSignal> <ButtonType>btKeyboardEvent</ButtonType> <KeyCode>173</KeyCode> </RemoteButton> logically, I have to find the start phrase <RemoteButton>\s*<Name>X-Fi , look forward to locate the matching </Remote... & delete everything found in between. I also have to cope with intermediate line feeds, white space & / characters.. hmm - I seem to be solving my own question as i type... will a simple (.*) suffice i.e. find <RemoteButton>\s*<Name>X-Fi(.*)</RemoteButton> well i can load the xml into sigil & the above almost works, ( it leaves some mysterious de> entries ) but I can't see how to then save as xml ( sigil v4) i try the same expression in notepad++ & it does NOT work - either N+= cannot do multi-line or it uses different regex syntax ? Last edited by cybmole; 09-28-2012 at 02:33 AM. |
09-28-2012, 03:55 AM | #2 |
frumious Bandersnatch
Posts: 7,515
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Replace <RemoteButton> and </RemoteButton> with two unused characters, like ¬ and |
Search for ¬\s*<Name>X-Fi[^|]*| (with multi-line) Replace back ¬ and | with <RemoteButton> and </RemoteButton> If Notepad does not have multi-line matching, find another editor that does, it's an essential feature. (In Sigil, you can copy-paste the result, instead of saving.) |
Advert | |
|
09-28-2012, 04:08 AM | #3 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
in sigil, it worked 100% when I replaced
<RemoteButton>\s*<Name>X-Fi(.*)</RemoteButton> with <RemoteButton>\s*<Name>X-Fi(.*)/RemoteButton> previously it worked on replace once, but not on replace all, it left in trailing each instance of de> from KeyCode> for some obscure reason. I googled notepad++ it does not have multi-line regex, not sure that any free editor does, except for sigil! open, edit, copy - paste from code view into notepad++ and save may be the way to go |
09-28-2012, 04:17 AM | #4 |
frumious Bandersnatch
Posts: 7,515
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Be careful if you have greedy matching or something like:
Code:
<RemoteButton> <Name>X-Fi 24-bit Wheel Button</Name> <MidiSignal>0A 41 44 09 76</MidiSignal> <USBSignal>02 C1 44 89 76</USBSignal> <ButtonType>btKeyboardEvent</ButtonType> <KeyCode>173</KeyCode> </RemoteButton> <RemoteButton> <Name>foo 24-bit Wheel Button</Name> <MidiSignal>0A 41 44 09 76</MidiSignal> <USBSignal>02 C1 44 89 76</USBSignal> <ButtonType>btKeyboardEvent</ButtonType> <KeyCode>173</KeyCode> </RemoteButton> |
09-28-2012, 04:26 AM | #5 | |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Quote:
I have not really mastered greedy management - that is why I was concerned about using (.*) is there a way to write STARTphrase(.*)ENDphrase type searches which makes them nongreedy this will be useful info for books also for removing redundant SPAN structures |
|
Advert | |
|
09-28-2012, 07:07 AM | #6 |
frumious Bandersnatch
Posts: 7,515
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
I believe you can use (.*?) instead to make it ungreedy. (Or using [^¬]* after replacement, as per my suggestion above, which is essentially the same.)
With <span> it may be more complicated, since they can be nested to no end (I assume <RemoteButton>s would not be nested). |
09-29-2012, 08:47 PM | #7 | |
Grand Sorcerer
Posts: 12,155
Karma: 73448616
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
|
I just went looking and came across http://www.editpadlite.com/ which claims
Quote:
|
|
09-30-2012, 06:33 AM | #9 | |
Grand Sorcerer
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
There are some subtle differences between its JGSoft regex engine and Sigils PCRE engine, but they rarely come up in most situations. JGSoft doesn't support /K, and its commands for changing the case of text is a little different. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
RegEx | el.motar | Sigil | 10 | 12-12-2011 05:54 PM |
Help me with regex please. | eVrajka | Library Management | 5 | 08-15-2011 12:17 PM |
regex help please | thevoiceofcheese | Calibre | 2 | 08-01-2011 11:27 PM |
PRS-300 Breaking up is hard to do....or not so hard after all.... | sterling1989 | Sony Reader | 2 | 09-02-2010 07:06 PM |
Easy hard drive data archiving with a USB hard drive adapter | Bob Russell | Lounge | 24 | 02-20-2007 04:15 PM |