View Single Post
Old 08-14-2014, 04:34 AM   #7
Papirus
Junior Member
Papirus began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Aug 2014
Device: Papyre 613
I'm an other user of Sigil trying to move to Calibre editor.
I'm testing the regex module because I have more than 300 regex (some of them quite complex) for correcting spanish texts.

As it's refered in the initial post, there is a problem, when length of the search chain is long, with saved searches and warning window sizes.

As I said, I have a lot of of regular expressions so in my opinion it would be interesting the possibility of grouping and nesting them in the saved searches window in order to keep them organized.
Another thing I miss is that when I use a saved search expression will automatically appear in the main window find/search area. Sometimes is necessary to make some changes in one of them and now the only way is opening the saved search window, editing the expressión, copying and pasting and then modify it.
In this find/search area, would be possible to add a "count all" button?

Related with the regex engine, I've realized of two main differences: \K (but I can circumvent it using variable-length lookbehinds) and the conditional structure (?(condition)Then|Else); this one is an important limitation compared with PCRE.

Properties \p are well supported, but \p{Lu} (uppercase letter) and \p{Ll} (lowercase letter) only works correctly if "case sensitive" option is checked (I don't know if this is the expected behaviour). I've tried (?f) with no success.

As is mentioned in thread \U \L \E don't work, but they are Sigil commands no PCRE. Any way, it would be very interesting a similar option in Calibre editor because it's a very frecuent mistake in texts: lower case after a dot, and now replacement is not possible.

In another context, sometimes scanned text includes & shy; (soft hyphen), this is a hidden hyphen that you can't see (at least in Sigil) and the only way to remove it is regex searching \xAD. Here the problem is not with the regex engine but with file preview panel where it appears as a dot. A similar behaviour it's with ​& #8203; (Zero-width space), that is also represented as a dot and it's another hidden character that is used in very very very long words in order to break the paragraph avoiding text exceed the screen boundary in readers. Here \x{200B} regex is not allowed.

Thank you very much for your program and support.

Last edited by Papirus; 08-14-2014 at 06:03 AM.
Papirus is offline   Reply With Quote