View Single Post
Old 08-14-2014, 12:39 PM   #13
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,535
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Papirus View Post
This is not the situation.

Imagine a text and that you want to look for roman numbers in order to small caps them.

([ivxlcdm]+) could be a possibility with <small>\1</small> as replacement.

The above expression will return roman numbers as well as different words that are formed with these characters.

I.e. clic, CD, ill, id, lid, livid, isolated letters, etc.

Probably in spanish there are some more frequent: Thousand (mil) and mainly My (mi)

So if I use the above expresion of course I will get romans numbers but hundreds coincidences of non roman numbers too. The only way I know to bypass this is through conditional structure (?(condition)then|else). It would be more or less (in fact is necessary to refine the expression):

Code:
(?i)(?(?=[clv]?i?d|[cm]m|[dm]i|clic|lcd|(m|ci)?v?il|[mdcl]|ill|livid) |(?<=\PL)([ivxlcdm]+)(?=\PL))
When the condition is satisfied (common words that are not roman numbers) the yes-pattern is used (we look for nothing: the white space in this example after de condition parenthesis). Otherwise we will look for romans.
Sorry. Perhaps I wasn't clear. I wasn't trying to give you a conditional expression that would do exactly what you want. I only intended to let you know that the if|then|else conditional construct is definitely supported by the editor's regex engine. It's going to be up to you to adjust your existing conditional expressions to work with the new engine.

I think the disconnect here is terminology. The calibre editor's regex module supports if|then|else conditionals: that part is not up for debate. I think where you may be running into problems is that it doesn't support conditionals using lookarounds. So instead of a conditional like:
Code:
(?(?=regex)then|else)
You might need to employ two opposite lookarounds:
Code:
(?=regex)then|(?!regex)else
to achieve the same end.

So in essence, your:
Code:
(?i)(?(?=[clv]?i?d|[cm]m|[dm]i|clic|lcd|(m|ci)?v?il|[mdcl]|ill|livid) |(?<=\PL)([ivxlcdm]+)(?=\PL))
becomes:
Code:
(?i)(?=[clv]?i?d|[cm]m|[dm]i|clic|lcd|(m|ci)?v?il|[mdcl]|ill|livid) |(?![clv]?i?d|[cm]m|[dm]i|clic|lcd|(m|ci)?v?il|[mdcl]|ill|livid)(?<=\PL)([ivxlcdm]+)(?=\PL)
Ugly ... but doable.

Last edited by DiapDealer; 08-14-2014 at 02:16 PM.
DiapDealer is offline   Reply With Quote