MobileRead Forums - View Single Post

Timur · 02-05-2012, 07:15 PM

Matches regex inside body element and inside character data only.
(First negative look-ahead(character data req.) works, although specification does not require the greater-than sign in character data to be escaped. But you have to save the epub at least once in Sigil and then reload it to escape all greater-than signs to > , or else you might miss some matches.)
(Second negative lookahead will not work if your document has more than one body element. Sigil allows this, but W3C validator gives error to such documents. I do not know the strict specifications for multiple body elements.)

Code:

(?s)regex(?![^<>]*>)(?!.*<body[^>]*>)

Matches regex only inside attribute values.
(If your document has single quotes(apostrophe) somewhere as attribute value delimiter instead of doubles, again, save and reload to change them all to double quotes, so that this regex works reliably. Saving and reloading also escapes all quotes inside attribute values to " , so that your elements stay well-formed. Reloading also escapes all greater-than signs, otherwise you might have the risk of matching something inside character data.)

Code:

regex(?=[^<]*>)(?!(?:[^<"]*"[^<"]*")+\s*/?>)

Edit: Typo.
Edit 2: Added clarification in bold.
Edit 3: Slight simplification in the second code.

02-05-2012, 07:15 PM	#2
Timur Connoisseur Posts: 54 Karma: 37363 Join Date: Aug 2011 Location: Istanbul Device: EBW1150, Nook STR	Matches regex inside body element and inside character data only. (First negative look-ahead(character data req.) works, although specification does not require the greater-than sign in character data to be escaped. But you have to save the epub at least once in Sigil and then reload it to escape all greater-than signs to > , or else you might miss some matches.) (Second negative lookahead will not work if your document has more than one body element. Sigil allows this, but W3C validator gives error to such documents. I do not know the strict specifications for multiple body elements.) Code: (?s)regex(?![^<>]>)(?!.<body[^>]>) Matches regex only inside attribute values. (If your document has single quotes(apostrophe) somewhere as attribute value delimiter instead of doubles, again, save and reload to change them all to double quotes, so that this regex works reliably. Saving and reloading also escapes all quotes inside attribute values to " , so that your elements stay well-formed. Reloading also escapes all greater-than signs, otherwise you might have the risk of matching something inside character data.) Code: regex(?=[^<]>)(?!(?:[^<"]"[^<"]")+\s/?>) Edit: Typo. Edit 2: Added clarification in bold. Edit 3: Slight simplification in the second code. Last edited by Timur; 02-05-2012 at 07:35 PM.*