Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 01-21-2014, 07:42 AM   #1
najgori
Klak
najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'
 
najgori's Avatar
 
Posts: 174
Karma: 150374
Join Date: Sep 2011
Location: Belgrade, Serbia
Device: many
regex help

1.
p(?!([^<]+)?>) "find character 'p' everywhere except within html tags"

2.
p(?!([^&]+)?;) "find character 'p' everywhere except within named entities"

i would like to merge these 2 searches to 1, so my new search would skip both tags and named entities.

i could replace all named entities with decimal, but this is indirect solution.
najgori is offline   Reply With Quote
Old 01-21-2014, 10:08 AM   #2
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
--edit, misread the question, can't figure out how to delete this.

Last edited by mzmm; 01-21-2014 at 11:02 AM.
mzmm is offline   Reply With Quote
Advert
Old 01-21-2014, 02:06 PM   #3
najgori
Klak
najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'
 
najgori's Avatar
 
Posts: 174
Karma: 150374
Join Date: Sep 2011
Location: Belgrade, Serbia
Device: many
it looks like a common task: "mark every letter 'p' but skip tags and named entities".

you can "cheat" by converting named entities to numbers which is recommended for ebooks (if i am correct?), but i am looking for "elegant" solution for this problem.
najgori is offline   Reply With Quote
Old 01-21-2014, 02:57 PM   #4
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
use the following format to check for "either _ or _": (either|or)
eschwartz is offline   Reply With Quote
Old 01-21-2014, 04:01 PM   #5
najgori
Klak
najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'
 
najgori's Avatar
 
Posts: 174
Karma: 150374
Join Date: Sep 2011
Location: Belgrade, Serbia
Device: many
p((?!([^<]+)?>)|(?!([^&]+)?;)) does not work.
najgori is offline   Reply With Quote
Advert
Old 01-21-2014, 08:04 PM   #6
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
instead of saying "(either not this | or not that)", you need to say "not (either this | or that)". Currently, if either condition is true, a match is made. A "p" in an html tag isn't in an entity, and vice versa, so of course it matches.

...


Here you go:
Code:
p(?!([^<&]+)?(>|;))
First we search for the "p" then we check ahead to see if the following is not there: any character string that does NOT contain either a "<" or "&" , followed by either a ">" or ";"

EDIT: If your sentence ends in a ";" this won't match, (you can try adding a negative lookbehind for a "&" and seeing where that leads you) so you're probably better off using calibre to convert all entities to unicode. And tags won't have that problem.

EDIT #2: added regex pipe and parentheses to beginning of answer, to clarify how it works.

Last edited by eschwartz; 01-22-2014 at 01:51 AM.
eschwartz is offline   Reply With Quote
Old 01-21-2014, 08:42 PM   #7
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
Quote:
Originally Posted by najgori View Post

you can "cheat" by converting named entities to numbers which is recommended for ebooks (if i am correct?), but i am looking for "elegant" solution for this problem.
Unless this is for a school project in epubs, the reader will not care or even know what kind of entities are there, so where does elegance enter the picture?
mrmikel is offline   Reply With Quote
Old 01-22-2014, 02:49 AM   #8
najgori
Klak
najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'najgori gives new meaning to the word 'superlative.'
 
najgori's Avatar
 
Posts: 174
Karma: 150374
Join Date: Sep 2011
Location: Belgrade, Serbia
Device: many
@mrmikel. elegant meaning single regular expression instead of many - entiites are not important

@eschwartz. also does not work. it misses some characters.

i am in a process of learning regular expressions and this was just a question "could this be done?" - not very important to me. big thanx to everybody who tried to find solutions.
najgori is offline   Reply With Quote
Old 01-22-2014, 11:06 AM   #9
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by najgori View Post
@mrmikel. elegant meaning single regular expression instead of many - entiites are not important

@eschwartz. also does not work. it misses some characters.

i am in a process of learning regular expressions and this was just a question "could this be done?" - not very important to me. big thanx to everybody who tried to find solutions.
Like I said. What you really want to do is convert entities to something else. calibre will change them to unicode characters, which I think is a very good way to deal with them.
eschwartz is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Need help for a regex wobohohoho Sigil 4 01-02-2013 04:42 AM
Help me with regex please. eVrajka Library Management 5 08-15-2011 12:17 PM
regex help please thevoiceofcheese Calibre 2 08-01-2011 11:27 PM
Regex Faster Sigil 2 04-24-2011 09:08 PM
Help with a regex A.T.E. Calibre 1 04-05-2010 07:50 AM


All times are GMT -4. The time now is 12:41 PM.


MobileRead.com is a privately owned, operated and funded community.