01-21-2014, 07:42 AM | #1 |
Klak
Posts: 174
Karma: 150374
Join Date: Sep 2011
Location: Belgrade, Serbia
Device: many
|
regex help
1.
p(?!([^<]+)?>) "find character 'p' everywhere except within html tags" 2. p(?!([^&]+)?;) "find character 'p' everywhere except within named entities" i would like to merge these 2 searches to 1, so my new search would skip both tags and named entities. i could replace all named entities with decimal, but this is indirect solution. |
01-21-2014, 10:08 AM | #2 |
Groupie
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
|
--edit, misread the question, can't figure out how to delete this.
Last edited by mzmm; 01-21-2014 at 11:02 AM. |
Advert | |
|
01-21-2014, 02:06 PM | #3 |
Klak
Posts: 174
Karma: 150374
Join Date: Sep 2011
Location: Belgrade, Serbia
Device: many
|
it looks like a common task: "mark every letter 'p' but skip tags and named entities".
you can "cheat" by converting named entities to numbers which is recommended for ebooks (if i am correct?), but i am looking for "elegant" solution for this problem. |
01-21-2014, 02:57 PM | #4 |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
use the following format to check for "either _ or _": (either|or)
|
01-21-2014, 04:01 PM | #5 |
Klak
Posts: 174
Karma: 150374
Join Date: Sep 2011
Location: Belgrade, Serbia
Device: many
|
p((?!([^<]+)?>)|(?!([^&]+)?;)) does not work.
|
Advert | |
|
01-21-2014, 08:04 PM | #6 |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
instead of saying "(either not this | or not that)", you need to say "not (either this | or that)". Currently, if either condition is true, a match is made. A "p" in an html tag isn't in an entity, and vice versa, so of course it matches.
... Here you go: Code:
p(?!([^<&]+)?(>|;)) EDIT: If your sentence ends in a ";" this won't match, (you can try adding a negative lookbehind for a "&" and seeing where that leads you) so you're probably better off using calibre to convert all entities to unicode. And tags won't have that problem. EDIT #2: added regex pipe and parentheses to beginning of answer, to clarify how it works. Last edited by eschwartz; 01-22-2014 at 01:51 AM. |
01-21-2014, 08:42 PM | #7 |
Color me gone
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
Unless this is for a school project in epubs, the reader will not care or even know what kind of entities are there, so where does elegance enter the picture?
|
01-22-2014, 02:49 AM | #8 |
Klak
Posts: 174
Karma: 150374
Join Date: Sep 2011
Location: Belgrade, Serbia
Device: many
|
@mrmikel. elegant meaning single regular expression instead of many - entiites are not important
@eschwartz. also does not work. it misses some characters. i am in a process of learning regular expressions and this was just a question "could this be done?" - not very important to me. big thanx to everybody who tried to find solutions. |
01-22-2014, 11:06 AM | #9 | |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Need help for a regex | wobohohoho | Sigil | 4 | 01-02-2013 04:42 AM |
Help me with regex please. | eVrajka | Library Management | 5 | 08-15-2011 12:17 PM |
regex help please | thevoiceofcheese | Calibre | 2 | 08-01-2011 11:27 PM |
Regex | Faster | Sigil | 2 | 04-24-2011 09:08 PM |
Help with a regex | A.T.E. | Calibre | 1 | 04-05-2010 07:50 AM |