Regex examples - Page 33

1v4n0 · 09-09-2015, 05:01 AM

Is there a way to search for characters or sequences only outside the html tags? I.E. only text that actually "appears" in the book. I have tried searching within the "book view" of calibre, but the replace doesn't work.

Right now I'm looking to replace "these" quotation marks with “these”.

Doitsu · 09-09-2015, 05:15 AM

Quote:

Originally Posted by 1v4n0

Right now I'm looking to replace "these" quotation marks with “these”.

You could use the the Smarten Punctuation option of the Modify ePub Calibre plugin to convert straight quotes to curly ones.

doubleshuffle · 09-09-2015, 06:19 AM

Quote:

Originally Posted by Doitsu

You could use the the Smarten Punctuation option of the Modify ePub Calibre plugin to convert straight quotes to curly ones.

Indeed. That plugin has saved me countless hours of the most mind-numbing, tedious work.

1v4n0 · 09-09-2015, 09:04 AM

Wow, cool. It worked

Ty

1v4n0 · 10-02-2015, 12:18 PM

I've counted all the opening and closing quotation marks (“ ”) in an epub, and the closing ones are one more than the opening ones.

How do I find the unopened one?

Turtle91 · 10-02-2015, 02:11 PM

try:
search: ”([^“]*?)”

1v4n0 · 10-02-2015, 06:32 PM

It seems to work. Ty

gipsy · 11-17-2015, 01:58 PM

Code:

#Fixes ώ in words that are misspelled
CorrectText("ώ fixes",r"(\w+)(ιίι|\(ό|ο\)|ίό|ο&gt;|ο'\)|ο'ι|ιό|οί|ιο|οι|&lt;ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixO)

Hi,
in the epub tidy plugin i use this code to find mispelled ώ
It searches for ιίι, (ό, etc and if it's correct it change it to ώ.

As the code is now, its working only works within a word (for example στιίιμα changes to στρώμα

It doesn't work in the begining or the end of the word (for example ιίιστε [the correct word is ώστε] or αντιπαρατεθιίι [the correct word is αντιπαρατεθώ]

If i change the first (\w+) to (\w+|\ ) i get findings and in the beggining if the word.
What i can change to match and the end of the word?

Thanks

1v4n0 · 02-28-2016, 12:59 PM

Just found out that the case conversion replacement regex (\L\1\E to make the string lowercase, \U\1\E to make it uppercase) works with sigil, but not with the calibre editor.

eschwartz · 02-28-2016, 01:21 PM

calibre doesn't use the PCRE library, it uses Matthew Barnett's python regex module -- which doesn't include uppercase/lowercase.

Fortunately, calibre does support function-replace, with pre-supplied functions to uppercase/lowercase text.

DiapDealer · 02-28-2016, 02:29 PM

Note that Sigil plugins will have the same limitation with regard to regular expressions. Both the standard re and Barnett's regex module are included with the bundled Python, but only the GUI S&R engine makes use of PCRE's case conversion switches (as well as the /K switch).

1v4n0 · 04-14-2016, 03:21 AM

Quote:

Originally Posted by eschwartz

Fortunately, calibre does support function-replace, with pre-supplied functions to uppercase/lowercase text.

So does this mean there is a way to convert cases with Calibre? How?

ty

jbacelar · 04-14-2016, 07:07 AM

Take a look at: http://manual.calibre-ebook.com/function_mode.html
Specifically: Automatically case of fixing the headings in the document, (one of the builtin functions in the editor).

Psymon · 07-14-2016, 05:21 AM

Hope it's okay for a veritable Regex newbie to post a query in this thread -- I'm only just beginning to learn about this stuff, but with any like it'll eventually start sinking in.

I seem to have developed an affinity for doing up electronic versions of "ye olde bookes" -- for example, right now I'm doing up several Shakespeare plays in the original Elizabethan English, endeavouring to give it somewhat of the "look and feel" of early typographic styles, complete with use of the long-ess (i.e. "ſ", the character that looks like an "f" but without the crossbar, and is actually an "s"). Along with the unusual use of the "u" and "v" characters in early typography, where an "ſ" is use instead of "s" has to do with placement within a word, rather than the "sound" of the character or anything else like that.

Very often when I find digital transcriptions of these early texts, they've kept the "u" and "v" oddities, but for some reason have changed all the long-esses to just "s" instead -- and so I have to change them back. The rule for when this is supposed to occur is actually fairly simple (although not all early printers/typographers followed this, but the vast majority did): virtually every instance of "s" should be changed to "ſ" unless it falls at the end of the word, then it remains as "s."

So to fix my texts up, I've been searching for every instance of "s" and then changing it to "ſ" -- which right away causes all my HTML code to need to be fixed up first, because things like "css," "class," "span," etc. get screwed up in the process -- and then I do another series of searches, looking for instances of "ſ" (long-ess) plus a "." or "," or ":" or ";" or "?" or "!" or ")" or "[space]" or "[apostrophe -- curly or otherwise], plus "<" should there be a closing </i> or </p> tag or something, i.e. wherever it might occur at the end of a word, and then changing it back to "s" again.

It's not that big a deal, actually, I can "correct" the long-esses in a whole book in, like, 5 or 10 minutes or so, but it would be totally cool to just whiz it off with one, single regex search, of course.

Oh, and it would have to be case-sensitive, of course -- all instances of upper-case "S" remain as "S."

ALSO...

A similar S&R could also be done on the "u" and "V" characters, the early rules for which also had to do with placement -- although as I mentioned before, most digital transcriptions of early texts seem to have retained those. It could come in handy, though, if at some point I encounter a text that has "modernized" the typography (but not word-spelling) of something.

For those characters, lower-case "v" was used for both "u" and "v" at the start of a word, while "v" was used for both "u" and "v" elsewhere in the world -- thus, the word we spell as "uvula" (that thing that dangles at the back of your mouth/throat) would be spelled rather oddly as "vuula."

As for upper-case "U" and "V," there was only one character, "V" -- although this is very easy to change with a simple, regular S&R, of course.

(Very often the upper-case "W" character -- and occasionally the lower-case "w," too -- would be written as "VV"/"vv," but most often not, it seems to have been essentially dependent on the font the printer had available and not based on any "rule." This is why, however, we call the "w" character "double-u," actually -- in case you ever wondered.)

Anyway, hope that's not too weird -- or, indeed, too basic -- a Regex question for me to ask here. The long-ess part of my query would certainly be really great to have a Regex expression for, though!

Thanks so much, in advance! And thanks for bearing with me here, too, of course, with my long question/explanation.

EDIT/POSTCRIPT: I forgot about "i" and "j"! In early typography, there was only one character for both -- "i" -- although once again that's easy enough to fix up with a regular S&R, of course. The only time "j" was used was as a ligature. For example, in this Elizabethan Shakespeare text I'm working on, the word "allies" (in modern English) came up, which was spelled at that time as "alliis -- and, hence, the "ii" became "ij" ("allijs"). If you look at how it looks, then you can see where we got the character "y" from.

ReaderRabbit · 07-27-2016, 09:46 AM

OK, here is a simple question for ya. In Sigil (0.7.4), I have a book where there is no separation between sentences. I am using this to find them: ([a-z])([\.\,\?\!])([A-Z])
which works perfectly. But what do I use in replace to move the new sentence over one space? There is over 3500 found and I don't want to insert a space manually for that many errors. Any suggestions?

09-09-2015, 05:01 AM	#481
1v4n0 Groupie Posts: 171 Karma: 40000 Join Date: Oct 2013 Device: kindle	Search only outside tags Is there a way to search for characters or sequences only outside the html tags? I.E. only text that actually "appears" in the book. I have tried searching within the "book view" of calibre, but the replace doesn't work. Right now I'm looking to replace "these" quotation marks with “these”.

10-02-2015, 12:18 PM	#485
1v4n0 Groupie Posts: 171 Karma: 40000 Join Date: Oct 2013 Device: kindle	Unopened quotation marks I've counted all the opening and closing quotation marks (“ ”) in an epub, and the closing ones are one more than the opening ones. How do I find the unopened one?

11-17-2015, 01:58 PM	#488
gipsy Connoisseur Posts: 81 Karma: 10 Join Date: Nov 2013 Device: Kobo Aura HD	Code: #Fixes ώ in words that are misspelled CorrectText("ώ fixes",r"(\w+)(ιίι\|\(ό\|ο\)\|ίό\|ο>\|ο'\)\|ο'ι\|ιό\|οί\|ιο\|οι\|<ο\|οϊ)(\w+)(?![^<>]>)(?!.<body[^>]>)", IsFixO) Hi, in the epub tidy plugin i use this code to find mispelled ώ It searches for ιίι, (ό, etc and if it's correct it change it to ώ. As the code is now, its working only works within a word (for example στιίιμα* changes to στρώμα It doesn't work in the begining or the end of the word (for example ιίιστε [the correct word is ώστε] or αντιπαρατεθιίι [the correct word is αντιπαρατεθώ] If i change the first (\w+) to (\w+\|\ ) i get findings and in the beggining if the word. What i can change to match and the end of the word? Thanks Last edited by gipsy; 11-18-2015 at 08:22 AM. Reason: Explanations

04-14-2016, 07:07 AM	#493
jbacelar Interested in the matter Posts: 421 Karma: 426094 Join Date: Dec 2011 Location: Spain, south coast Device: Pocketbook InkPad 3	Take a look at: http://manual.calibre-ebook.com/function_mode.html Specifically: Automatically case of fixing the headings in the document, (one of the builtin functions in the editor). Last edited by jbacelar; 04-14-2016 at 07:10 AM.

07-14-2016, 05:21 AM	#494
Psymon Chief Bohemian Misfit Posts: 571 Karma: 462964 Join Date: May 2013 Device: iPad, ADE	Hope it's okay for a veritable Regex newbie to post a query in this thread -- I'm only just beginning to learn about this stuff, but with any like it'll eventually start sinking in. I seem to have developed an affinity for doing up electronic versions of "ye olde bookes" -- for example, right now I'm doing up several Shakespeare plays in the original Elizabethan English, endeavouring to give it somewhat of the "look and feel" of early typographic styles, complete with use of the long-ess (i.e. "ſ", the character that looks like an "f" but without the crossbar, and is actually an "s"). Along with the unusual use of the "u" and "v" characters in early typography, where an "ſ" is use instead of "s" has to do with placement within a word, rather than the "sound" of the character or anything else like that. Very often when I find digital transcriptions of these early texts, they've kept the "u" and "v" oddities, but for some reason have changed all the long-esses to just "s" instead -- and so I have to change them back. The rule for when this is supposed to occur is actually fairly simple (although not all early printers/typographers followed this, but the vast majority did): virtually every instance of "s" should be changed to "ſ" unless it falls at the end of the word, then it remains as "s." So to fix my texts up, I've been searching for every instance of "s" and then changing it to "ſ" -- which right away causes all my HTML code to need to be fixed up first, because things like "css," "class," "span," etc. get screwed up in the process -- and then I do another series of searches, looking for instances of "ſ" (long-ess) plus a "." or "," or ":" or ";" or "?" or "!" or ")" or "[space]" or "[apostrophe -- curly or otherwise], plus "<" should there be a closing </i> or </p> tag or something, i.e. wherever it might occur at the end of a word, and then changing it back to "s" again. It's not that big a deal, actually, I can "correct" the long-esses in a whole book in, like, 5 or 10 minutes or so, but it would be totally cool to just whiz it off with one, single regex search, of course. Oh, and it would have to be case-sensitive, of course -- all instances of upper-case "S" remain as "S." ALSO... A similar S&R could also be done on the "u" and "V" characters, the early rules for which also had to do with placement -- although as I mentioned before, most digital transcriptions of early texts seem to have retained those. It could come in handy, though, if at some point I encounter a text that has "modernized" the typography (but not word-spelling) of something. For those characters, lower-case "v" was used for both "u" and "v" at the start of a word, while "v" was used for both "u" and "v" elsewhere in the world -- thus, the word we spell as "uvula" (that thing that dangles at the back of your mouth/throat) would be spelled rather oddly as "vuula." As for upper-case "U" and "V," there was only one character, "V" -- although this is very easy to change with a simple, regular S&R, of course. (Very often the upper-case "W" character -- and occasionally the lower-case "w," too -- would be written as "VV"/"vv," but most often not, it seems to have been essentially dependent on the font the printer had available and not based on any "rule." This is why, however, we call the "w" character "double-u," actually -- in case you ever wondered.) Anyway, hope that's not too weird -- or, indeed, too basic -- a Regex question for me to ask here. The long-ess part of my query would certainly be really great to have a Regex expression for, though! Thanks so much, in advance! And thanks for bearing with me here, too, of course, with my long question/explanation. EDIT/POSTCRIPT: I forgot about "i" and "j"! In early typography, there was only one character for both -- "i" -- although once again that's easy enough to fix up with a regular S&R, of course. The only time "j" was used was as a ligature. For example, in this Elizabethan Shakespeare text I'm working on, the word "allies" (in modern English) came up, which was spelled at that time as "alliis -- and, hence, the "ii" became "ij" ("allijs"). If you look at how it looks, then you can see where we got the character "y" from. Last edited by Psymon; 07-14-2016 at 05:35 AM.

09-09-2015, 09:04 AM	#484
1v4n0 Groupie Posts: 171 Karma: 40000 Join Date: Oct 2013 Device: kindle	Wow, cool. It worked Ty

10-02-2015, 02:11 PM	#486
Turtle91 A Hairy Wizard Posts: 3,094 Karma: 18727053 Join Date: Dec 2012 Location: Charleston, SC today Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire	try: search: ”([^“]*?)”

10-02-2015, 06:32 PM	#487
1v4n0 Groupie Posts: 171 Karma: 40000 Join Date: Oct 2013 Device: kindle	It seems to work. Ty

02-28-2016, 12:59 PM	#489
1v4n0 Groupie Posts: 171 Karma: 40000 Join Date: Oct 2013 Device: kindle	Just found out that the case conversion replacement regex (\L\1\E to make the string lowercase, \U\1\E to make it uppercase) works with sigil, but not with the calibre editor.

02-28-2016, 01:21 PM	#490
eschwartz Ex-Helpdesk Junkie Posts: 19,422 Karma: 85397180 Join Date: Nov 2012 Location: The Beaten Path, USA, Roundworld, This Side of Infinity Device: Kindle Touch fw5.3.7 (Wifi only)	calibre doesn't use the PCRE library, it uses Matthew Barnett's python regex module -- which doesn't include uppercase/lowercase. Fortunately, calibre does support function-replace, with pre-supplied functions to uppercase/lowercase text.

02-28-2016, 02:29 PM	#491
DiapDealer Grand Sorcerer Posts: 27,549 Karma: 193191846 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD	Note that Sigil plugins will have the same limitation with regard to regular expressions. Both the standard re and Barnett's regex module are included with the bundled Python, but only the GUI S&R engine makes use of PCRE's case conversion switches (as well as the /K switch).

07-27-2016, 09:46 AM	#495
ReaderRabbit Member Posts: 24 Karma: 10 Join Date: Mar 2011 Location: Colorado Device: Cruz Tablet	OK, here is a simple question for ya. In Sigil (0.7.4), I have a book where there is no separation between sentences. I am using this to find them: ([a-z])([\.\,\?\!])([A-Z]) which works perfectly. But what do I use in replace to move the new sentence over one space? There is over 3500 found and I don't want to insert a space manually for that many errors. Any suggestions?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Examples of Subgroups	emonti8384	Lounge	32	02-26-2011 06:00 PM
Accessories Pen examples	Gunnerp245	enTourage Archive	15	02-21-2011 03:23 PM
Stylesheet examples?	Skitzman69	Sigil	15	09-24-2010 08:24 PM
Examples	kafkaesque1978	iRiver Story	1	07-26-2010 03:49 PM
Looking for examples of typos in eBooks	Tonycole	General Discussions	1	05-05-2010 04:23 AM