Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 09-22-2014, 01:28 AM   #1
Chris_Snow
Zealot
Chris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipse
 
Posts: 148
Karma: 8170
Join Date: Jul 2013
Device: kobo glo
Upper to Lower Case Regex - I'm stuck!

Hello,

I've been trying to teach myself some basic regex to help me with tweaking my epubs via sigil. I managed to sort out most of the stuff on my own, but have run into some trouble with the following...

Trying to change "CHAPTER TWENTY-TWO" to "Chapter Twenty-Two"

I can pick up the Uppercase using as my test case...The FIND field..
Code:
>CHAPTER ([A-Z])([A-Z]+)-([A-Z])([A-Z]+)<
but my replace field fails. I'm using...

Code:
>Chapter \1\L\2-\3\L\4<
The thing is that I thought this should give me the result I'm after but instead I get "Chapter Twenty-two" - the second number being lower case. What am I doing wrong?

Also, can I add regex to pick up "CHAPTER ONE" and also the double (or triple) digit numbers like "CHAPTER TWENTY-TWO" all in the same regex?

What should be in the "Replace Field" to work?

Much appreciated.
Chris_Snow is offline   Reply With Quote
Old 09-22-2014, 01:41 AM   #2
Chris_Snow
Zealot
Chris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipse
 
Posts: 148
Karma: 8170
Join Date: Jul 2013
Device: kobo glo
Update

I continued trying after posting this and discovered this post
https://www.mobileread.com/forums/sho...d.php?t=122670

From that I got this for the "Find Field"...
Code:
>(CHAPTER )(\w)(.+?\b)((-)(\w)(.+?\b))?<
And this for the "Replace field"...
Code:
>Chapter \u\2\L\3\E\5\u\6\L\7\E<
Chris_Snow is offline   Reply With Quote
Old 09-22-2014, 11:18 AM   #3
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
I tend to just use a general purpose regex to go from ALL CAPS -> Title Case. NEVER press "Replace All" when using this, only replace one-by-one.

I initially used this one (I probably gathered it from the forums here a long time ago):

Search: (\b)([A-Z])([A-Z]+)
Replace: \1\2\L\3

What this says in English is:

(RED) Grab the word boundary. More info can be found here: http://www.regular-expressions.info/wordboundaries.html
(BLUE) Grab the first capital letter A through Z and stick it in \2
(GREEN) Grab all the rest of the capital A through Zs in a row, and stick it in \3.

(RED) Replace with the word boundary, (BLUE) place the first capital letter, (GREEN) and change all of the UPPER CASE letters in \3 into their lowercase versions.

Recently though, I upgraded to this version:

Search: (\b)([\p{Lu}])([\p{Lu}]+)
Replace: \1\2\L\3

It looks a little scarier, but it does the same exact thing except it can handle UPPERCASE/lowercase versions of unicode letters.

Be careful, cases where this regex "fails" is getting hits on words with Roman Numerals ("World War II"). Also, in Title Case, words like "to", "from", "in", etc. etc. shouldn't begin with a capital letter, so I fix those manually as I come across them.

Last edited by Tex2002ans; 09-22-2014 at 11:21 AM.
Tex2002ans is offline   Reply With Quote
Old 09-22-2014, 11:43 AM   #4
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Good call on the upgraded unicode version.

But do you really need to capture/reinsert the word-boundary itself? I was under the impression that it's a zero-length match.
DiapDealer is offline   Reply With Quote
Old 09-22-2014, 11:51 AM   #5
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
And why is the match-all-unicode inside a set? A set of one...
eschwartz is offline   Reply With Quote
Old 09-22-2014, 06:23 PM   #6
Chris_Snow
Zealot
Chris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipse
 
Posts: 148
Karma: 8170
Join Date: Jul 2013
Device: kobo glo
@Tex2002ans,

That's almost identical to the one I used to use, but I wanted something that picked up more.

Does anyone know if the above regex can be changed to pick up single words as well?

Current Find Feild
Code:
>(CHAPTER )(\w)(.+?\b)((-)(\w)(.+?\b))?<
Current Replace Field
Code:
>Chapter \u\2\L\3\E\5\u\6\L\7\E<
I use it to change Page/Chapter Headings from Uppercase to Title case. The above picks up all the chapter ones but doesn't select things like "PROLOGUE" and "ACKNOWLEDGMENTS." I know I need to change the "(CHAPTER )" part of it, but don't know exactly what should go there - or in the find field. Thx
Chris_Snow is offline   Reply With Quote
Old 09-22-2014, 07:52 PM   #7
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Find:
Code:
>(C|P|A)(HAPTER |ROLOGUE|CKNOWLEDGMENTS)((\w)(.+?\b))?((-)(\w)(.+?\b))?<
Replace:
Code:
>\1\L\2\E\u\4\L\5\E\7\u\8\L\9\E<

Last edited by eschwartz; 09-22-2014 at 07:55 PM.
eschwartz is offline   Reply With Quote
Old 09-22-2014, 07:54 PM   #8
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
This will be much easier when calibre editor supports macros. Kovid's wishlist includes a function mode that will allow you to (among other things) apply calibre's titlecasing inside a regex.

Last edited by eschwartz; 09-23-2014 at 02:29 AM.
eschwartz is offline   Reply With Quote
Old 09-23-2014, 02:25 AM   #9
Chris_Snow
Zealot
Chris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipse
 
Posts: 148
Karma: 8170
Join Date: Jul 2013
Device: kobo glo
Quote:
Originally Posted by eschwartz View Post
This will be much easier when calibre supports macros. Kovid's wishlist includes a function mode that will allow you to apply calibre's titlecasing inside a regex.
The drool icon got me more excited than your words. Not sure what it all means, but it must be good

I had a quick look and Calibre Epub Editor only recently and am suitably impressed. Still using sigil which at this point does what I need (still on XP so can't get into the latest versions of Calibre

Still a lot I have to learn about formatting though - I'll probably never be great, but at least I can tweak the things I need.

Thx very much for the code you supplied. I'm off to have a play (with the code that is! )
Chris_Snow is offline   Reply With Quote
Old 09-23-2014, 02:33 AM   #10
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
My pleasure.

Note that Sigil still has the advantage here as calibre does not support some advanced regex stuff like case changing. But if/when we get macros it will more than compensate.
eschwartz is offline   Reply With Quote
Old 09-23-2014, 03:01 AM   #11
Chris_Snow
Zealot
Chris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipse
 
Posts: 148
Karma: 8170
Join Date: Jul 2013
Device: kobo glo
@eschwartz

Hey...I was just having a play with your code and can see what goes on. I'm guessing that to pick up all the headings, you'd have to have each one listed in the "Find" regex. That would mean I'd have to know all the title headings.

Is there a way to do the same thing without knowing all the headings?

I started out trying to do it with "([A-Z])([A-Z]+)" using this type of regex, but I've found this doesn't work.

Using the original regex

Code:
(CHAPTER )(\w)(.+?\b)((-)(\w)(.+?\b))?
I thought perhaps there was some other code we could replace the "(CHAPTER )" part with and that would pick of all the other title headings and CHAPTER .

It could well be possible that you can't do this and I''ll have to either use two different regex's or do the rest of it manually??

Much appreciated.
Chris_Snow is offline   Reply With Quote
Old 09-23-2014, 04:31 AM   #12
Steadyhands
Connoisseur
Steadyhands began at the beginning.
 
Steadyhands's Avatar
 
Posts: 57
Karma: 10
Join Date: Dec 2011
Device: Samsung Tablet
I use two different searches to do this.
Code:
Find1
<h2>(?i)Chapter ([[:lower:]])([[:lower:]]{2,})</h2>
Replace1
<h2>Chapter \U\1\E\L\2\E</h2>

Find2
<h2>(?i)Chapter ([[:lower:]])([[:lower:]]{2,})-([[:lower:]])([[:lower:]]{2,})</h2>
Replace2
<h2>Chapter \U\1\E\L\2\E-\U\3\E\L\4\E</h2>
Steadyhands is offline   Reply With Quote
Old 09-23-2014, 03:35 PM   #13
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by Chris_Snow View Post
@eschwartz

Hey...I was just having a play with your code and can see what goes on. I'm guessing that to pick up all the headings, you'd have to have each one listed in the "Find" regex. That would mean I'd have to know all the title headings.

Is there a way to do the same thing without knowing all the headings?

I started out trying to do it with "([A-Z])([A-Z]+)" using this type of regex, but I've found this doesn't work.

Using the original regex

Code:
(CHAPTER )(\w)(.+?\b)((-)(\w)(.+?\b))?
I thought perhaps there was some other code we could replace the "(CHAPTER )" part with and that would pick of all the other title headings and CHAPTER .

It could well be possible that you can't do this and I''ll have to either use two different regex's or do the rest of it manually??

Much appreciated.
I don't see why not.

Find:
Code:
>([A-Z])([A-Z]+\s?)((\w)(.+?\b))?((-)(\w)(.+?\b))?<
Replace:
Code:
>\1\L\2\E\u\4\L\5\E\7\u\8\L\9\E<
Perhaps you accidentally left off the space which allows the CHAPTER top be split up.
eschwartz is offline   Reply With Quote
Old 09-23-2014, 08:34 PM   #14
Chris_Snow
Zealot
Chris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipseChris_Snow can illuminate an eclipse
 
Posts: 148
Karma: 8170
Join Date: Jul 2013
Device: kobo glo
Thx very much for these regex's. They are going to keep me busy for a while I can see. Always so much to learn.
Chris_Snow is offline   Reply With Quote
Old 11-03-2023, 12:43 AM   #15
Bozana
Member
Bozana began at the beginning.
 
Bozana's Avatar
 
Posts: 11
Karma: 10
Join Date: Jan 2013
Device: PC
I've tried everything above... I need to have a search and replace uppercases in the beggining of a paragrahs. Example:

<p class=subsq">SARAH SNUGGLED DEEPER into Kade’s embrace.</p>

And I need it to be:

<p class=subsq">Sarah snuggled deeper to Kade’s embrace</p>

I've tired:

Find: ([[:upper:]])([[:upper:]]{1,})
Replace: \1\L\2\E

But realised that this works best in Sigil and not Calibre. And this this needs to work in Calibre please. Any help and suggestions would be appricated.

Thank you!

Last edited by Bozana; 11-03-2023 at 12:46 AM.
Bozana is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Plugin to trasform database to upper case Xwang Plugins 21 08-07-2019 06:03 PM
upper case schaf Kobo Reader 2 04-10-2013 07:23 PM
upper case to sentence case conversion cybmole Sigil 8 01-20-2011 06:03 AM
Classic Little icon for page turning on the upper right gets stuck. MangaEbooker Barnes & Noble NOOK 3 10-07-2010 02:25 PM
Buy Sony PRS-505 Ornamental Plates both lower and upper pnyc Flea Market 2 05-24-2009 11:17 AM


All times are GMT -4. The time now is 09:25 PM.


MobileRead.com is a privately owned, operated and funded community.