09-22-2014, 01:28 AM | #1 |
Zealot
Posts: 148
Karma: 8170
Join Date: Jul 2013
Device: kobo glo
|
Upper to Lower Case Regex - I'm stuck!
Hello,
I've been trying to teach myself some basic regex to help me with tweaking my epubs via sigil. I managed to sort out most of the stuff on my own, but have run into some trouble with the following... Trying to change "CHAPTER TWENTY-TWO" to "Chapter Twenty-Two" I can pick up the Uppercase using as my test case...The FIND field.. Code:
>CHAPTER ([A-Z])([A-Z]+)-([A-Z])([A-Z]+)< Code:
>Chapter \1\L\2-\3\L\4< Also, can I add regex to pick up "CHAPTER ONE" and also the double (or triple) digit numbers like "CHAPTER TWENTY-TWO" all in the same regex? What should be in the "Replace Field" to work? Much appreciated. |
09-22-2014, 01:41 AM | #2 |
Zealot
Posts: 148
Karma: 8170
Join Date: Jul 2013
Device: kobo glo
|
Update
I continued trying after posting this and discovered this post
https://www.mobileread.com/forums/sho...d.php?t=122670 From that I got this for the "Find Field"... Code:
>(CHAPTER )(\w)(.+?\b)((-)(\w)(.+?\b))?< Code:
>Chapter \u\2\L\3\E\5\u\6\L\7\E< |
09-22-2014, 11:18 AM | #3 |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
I tend to just use a general purpose regex to go from ALL CAPS -> Title Case. NEVER press "Replace All" when using this, only replace one-by-one.
I initially used this one (I probably gathered it from the forums here a long time ago): Search: (\b)([A-Z])([A-Z]+) Replace: \1\2\L\3 What this says in English is: (RED) Grab the word boundary. More info can be found here: http://www.regular-expressions.info/wordboundaries.html (BLUE) Grab the first capital letter A through Z and stick it in \2 (GREEN) Grab all the rest of the capital A through Zs in a row, and stick it in \3. (RED) Replace with the word boundary, (BLUE) place the first capital letter, (GREEN) and change all of the UPPER CASE letters in \3 into their lowercase versions. Recently though, I upgraded to this version: Search: (\b)([\p{Lu}])([\p{Lu}]+) Replace: \1\2\L\3 It looks a little scarier, but it does the same exact thing except it can handle UPPERCASE/lowercase versions of unicode letters. Be careful, cases where this regex "fails" is getting hits on words with Roman Numerals ("World War II"). Also, in Title Case, words like "to", "from", "in", etc. etc. shouldn't begin with a capital letter, so I fix those manually as I come across them. Last edited by Tex2002ans; 09-22-2014 at 11:21 AM. |
09-22-2014, 11:43 AM | #4 |
Grand Sorcerer
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Good call on the upgraded unicode version.
But do you really need to capture/reinsert the word-boundary itself? I was under the impression that it's a zero-length match. |
09-22-2014, 11:51 AM | #5 |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
And why is the match-all-unicode inside a set? A set of one...
|
09-22-2014, 06:23 PM | #6 |
Zealot
Posts: 148
Karma: 8170
Join Date: Jul 2013
Device: kobo glo
|
@Tex2002ans,
That's almost identical to the one I used to use, but I wanted something that picked up more. Does anyone know if the above regex can be changed to pick up single words as well? Current Find Feild Code:
>(CHAPTER )(\w)(.+?\b)((-)(\w)(.+?\b))?< Code:
>Chapter \u\2\L\3\E\5\u\6\L\7\E< |
09-22-2014, 07:52 PM | #7 |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Find:
Code:
>(C|P|A)(HAPTER |ROLOGUE|CKNOWLEDGMENTS)((\w)(.+?\b))?((-)(\w)(.+?\b))?< Code:
>\1\L\2\E\u\4\L\5\E\7\u\8\L\9\E< Last edited by eschwartz; 09-22-2014 at 07:55 PM. |
09-22-2014, 07:54 PM | #8 |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
This will be much easier when calibre editor supports macros. Kovid's wishlist includes a function mode that will allow you to (among other things) apply calibre's titlecasing inside a regex.
Last edited by eschwartz; 09-23-2014 at 02:29 AM. |
09-23-2014, 02:25 AM | #9 | |
Zealot
Posts: 148
Karma: 8170
Join Date: Jul 2013
Device: kobo glo
|
Quote:
I had a quick look and Calibre Epub Editor only recently and am suitably impressed. Still using sigil which at this point does what I need (still on XP so can't get into the latest versions of Calibre Still a lot I have to learn about formatting though - I'll probably never be great, but at least I can tweak the things I need. Thx very much for the code you supplied. I'm off to have a play (with the code that is! ) |
|
09-23-2014, 02:33 AM | #10 |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
My pleasure.
Note that Sigil still has the advantage here as calibre does not support some advanced regex stuff like case changing. But if/when we get macros it will more than compensate. |
09-23-2014, 03:01 AM | #11 |
Zealot
Posts: 148
Karma: 8170
Join Date: Jul 2013
Device: kobo glo
|
@eschwartz
Hey...I was just having a play with your code and can see what goes on. I'm guessing that to pick up all the headings, you'd have to have each one listed in the "Find" regex. That would mean I'd have to know all the title headings. Is there a way to do the same thing without knowing all the headings? I started out trying to do it with "([A-Z])([A-Z]+)" using this type of regex, but I've found this doesn't work. Using the original regex Code:
(CHAPTER )(\w)(.+?\b)((-)(\w)(.+?\b))? It could well be possible that you can't do this and I''ll have to either use two different regex's or do the rest of it manually?? Much appreciated. |
09-23-2014, 04:31 AM | #12 |
Connoisseur
Posts: 57
Karma: 10
Join Date: Dec 2011
Device: Samsung Tablet
|
I use two different searches to do this.
Code:
Find1 <h2>(?i)Chapter ([[:lower:]])([[:lower:]]{2,})</h2> Replace1 <h2>Chapter \U\1\E\L\2\E</h2> Find2 <h2>(?i)Chapter ([[:lower:]])([[:lower:]]{2,})-([[:lower:]])([[:lower:]]{2,})</h2> Replace2 <h2>Chapter \U\1\E\L\2\E-\U\3\E\L\4\E</h2> |
09-23-2014, 03:35 PM | #13 | |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
Find: Code:
>([A-Z])([A-Z]+\s?)((\w)(.+?\b))?((-)(\w)(.+?\b))?< Code:
>\1\L\2\E\u\4\L\5\E\7\u\8\L\9\E< |
|
09-23-2014, 08:34 PM | #14 |
Zealot
Posts: 148
Karma: 8170
Join Date: Jul 2013
Device: kobo glo
|
Thx very much for these regex's. They are going to keep me busy for a while I can see. Always so much to learn.
|
11-03-2023, 12:43 AM | #15 |
Member
Posts: 11
Karma: 10
Join Date: Jan 2013
Device: PC
|
I've tried everything above... I need to have a search and replace uppercases in the beggining of a paragrahs. Example:
<p class=subsq">SARAH SNUGGLED DEEPER into Kade’s embrace.</p> And I need it to be: <p class=subsq">Sarah snuggled deeper to Kade’s embrace</p> I've tired: Find: ([[:upper:]])([[:upper:]]{1,}) Replace: \1\L\2\E But realised that this works best in Sigil and not Calibre. And this this needs to work in Calibre please. Any help and suggestions would be appricated. Thank you! Last edited by Bozana; 11-03-2023 at 12:46 AM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Plugin to trasform database to upper case | Xwang | Plugins | 21 | 08-07-2019 06:03 PM |
upper case | schaf | Kobo Reader | 2 | 04-10-2013 07:23 PM |
upper case to sentence case conversion | cybmole | Sigil | 8 | 01-20-2011 06:03 AM |
Classic Little icon for page turning on the upper right gets stuck. | MangaEbooker | Barnes & Noble NOOK | 3 | 10-07-2010 02:25 PM |
Buy Sony PRS-505 Ornamental Plates both lower and upper | pnyc | Flea Market | 2 | 05-24-2009 11:17 AM |