Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 02-04-2014, 11:52 PM   #1
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,071
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
RegEx Question: H1 ALL CAPS to All Caps

I've been reformatting a number of books and the chapter titles are in all caps.

Some times there's others like a section. I got it to the point where the tags are pretty consistent, but making text title case in BV is a lot of work

Is there a good F-R that will convert things like

<h1>CHAPTER ONE</h1>

into

<h1>Chapter One</h1>

I can use it for other tags but I'm stuck on the title case part

There is a TitleCase stored clip in Sigil but I couldn't seem to get it to work right: \u\1


Thanks
phossler is offline   Reply With Quote
Old 02-05-2014, 01:00 AM   #2
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by phossler View Post
Is there a good F-R that will convert things like

<h1>CHAPTER ONE</h1>

into

<h1>Chapter One</h1>
Find:<h1>CHAPTER ([[:upper:]])([[:upper:]]{2,})</h1>
Replace:<h1>Chapter \1\L\2\E</h1>

For more examples see this older post of mine.
Doitsu is offline   Reply With Quote
Advert
Old 02-05-2014, 01:42 AM   #3
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
I actually use this one. It seems to suit me well. It fails on accented characters though, so it could be more robust (I just haven't gotten around to updating it). Also, it doesn't work specifically on <h1>, it just works on all words that are in ALL CAPS.

Search: (\b)([A-Z])([A-Z]+)
Replace: \1\2\L\3

It is really helpful when I want to change the text that appears in the Sigil auto-generated TOC, but want to leave the displayed text as ALL CAPS.

Blue: \b is called a "word boundary". For more info on this, see: http://www.regular-expressions.info/wordboundaries.html
Green: This captures the first capital letter of the word. It makes sure it stays capital in the replace.
Red: This captures all the other capital letters, and replaces them with lowercase.

(NEVER Replace All while using this, always replace one by one)

Example:

Code:
<h1 title="1. THIS IS CHAPTER ONE">THIS IS CHAPTER ONE</p>
After:

Code:
<h1 title="1. This Is Chapter One">THIS IS CHAPTER ONE</p>
As I go along, I just manually replace the Title Case words that should be completely lowercase (and, for, is, ...):

http://grammar.about.com/od/grammarf...italstitle.htm
Tex2002ans is offline   Reply With Quote
Old 02-05-2014, 08:34 AM   #4
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,071
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
@Tex2002ans -- thanks, I tried something similar, but the document had a lot of all caps acronyms that were being found. I tried the 'Replace/Find Next' but that was taking a lot of time

@Doitsu -- that looks similar to something I Googled (which didn't work), except for the :upper:'s. The Google used /u which didn't seem to work after I bracketed it with the H1 tags. Is there a difference?

I'll try your suggestion when I get home

Paul
phossler is offline   Reply With Quote
Old 02-05-2014, 08:47 AM   #5
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,465
Karma: 192992430
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
\u and \l only affect one character.

\U and \L keep changing the case of characters until \E is encountered.

@Tex2002ans:
The \b word boundary is handy, but it can also result in some confusing behavior if you run into any typographic quotes, apostrophes, or other unicode characters contained in the text. In ebooks, I always try to use the unicode switch to force \b to accommodate such things.

Last edited by DiapDealer; 02-05-2014 at 09:11 AM.
DiapDealer is offline   Reply With Quote
Advert
Old 02-05-2014, 10:48 AM   #6
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,119
Karma: 73448614
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
My minor suggestion would be
Code:
Find: (<h[1-9]>)([A-Z])([A-Z| ]+)
Replace: \1\2\L\3
The big differences are including the <h tag, and allowing multiple words in the rest of the title by adding the space.

Additional punctuation symbols could be added by adding them to the [A-Z| ] grouping; ie [A-Z| |,|;] etc
PeterT is offline   Reply With Quote
Old 02-05-2014, 10:50 AM   #7
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,071
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
@Doitsu --

Quote:
Find:<h1>CHAPTER ([[:upper:]])([[:upper:]]{2,})</h1>
Replace:<h1>Chapter \1\L\2\E</h1>

For more examples see this older post of mine.
Thanks, and I did look at the older post. Interesting

However, when I followed the suggestion, it didn't quite get what I was hoping to accomplish since there could be anything between the H1 (or other) tags and not literally "CHAPTER"

What I thinking I was looking for was a more general purpose F-R that would make any text between tags and Title Case it.

The <h1>CHAPTER ONE</h1> was really just a (poor) example. A general example might be

<tag>TEXT TEXT TEXT</tag>

becoming

<tag>Text Text Text</tag>

since some of the H#'s are for "PART 2" and "CHAPTER 3" and sometimes just "ONE", "TWO", ...

In the best of all possible worlds, I'd have 2 or 3 stored clips to title case H1, H2, and H3. When I had an oddball set of tags, I'd do that F&R manually, but I'd be smarter

Thanks
phossler is offline   Reply With Quote
Old 02-05-2014, 11:04 AM   #8
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,071
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
@PeterT -- thanks, sort of close

I tried it with Minimal Match on and off just to see

Code:
Original
<h1>AAAAAAAAAAAAAAAAAA BBBBBBBBBBBBBBBBBBBB BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB</h1>

Minimal match unchecked
<h1>Aaaaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb</h1>

Minimal match checked
<h1>AaAAAAAAAAAAAAAAAA BBBBBBBBBBBBBBBBBBBB BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB</h1>
With 'Unchecked' the first word is Title, but the rest are all lower case

Do you or any RegEx gurus have some tweaks??

Hoped for result
<h1>Aaaaaaaaaaa Bbbbbbbbbbbb Cccccccccccc</h1>

Thanks again
phossler is offline   Reply With Quote
Old 02-05-2014, 11:30 AM   #9
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by phossler View Post
What I thinking I was looking for was a more general purpose F-R that would make any text between tags and Title Case it.
AFAIK, you'd probably a need script to convert an unknown number of words between two specific tags to title case.
Doitsu is offline   Reply With Quote
Old 02-05-2014, 12:00 PM   #10
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
if you compromise a little, then
1 with a single pass you can change everything to lower case then
2 with a 2nd pass you can easily capitalise the 1st letter of the 1st word

so you've gone from CHAPTER ONE to Chapter one

a 3rd ( harder to write) pass could then probably find & capitalise the 2nd word....
it comes down to how much time you want in invest in fancy coding compared with just manually retyping the offending titles
cybmole is offline   Reply With Quote
Old 02-05-2014, 12:37 PM   #11
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by cybmole View Post
if you compromise a little, then
1 with a single pass you can change everything to lower case then
2 with a 2nd pass you can easily capitalise the 1st letter of the 1st word

so you've gone from CHAPTER ONE to Chapter one
In that case, I'd rather use the following expressions:

Find:([[:upper:]])([[:upper:]]+\s*)
Replace:\1\L\2\E

This will find an uppercase letter followed by one or more uppercase letters and zero or more white-space character anywhere in the text and can be used to convert several uppercase words in a row.
Doitsu is offline   Reply With Quote
Old 02-05-2014, 01:09 PM   #12
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by Doitsu View Post
In that case, I'd rather use the following expressions:

Find:([[:upper:]])([[:upper:]]+\s*)
Replace:\1\L\2\E

This will find an uppercase letter followed by one or more uppercase letters and zero or more white-space character anywhere in the text and can be used to convert several uppercase words in a row.
ok , then the code to capitalise 1st letter after the chosen tag close is easy, & next: within your chosen tags you then find any instance of space followed by a lower case letter & flip that to upper case.with that last one, rather than try to code a iterative loop, I'd just spam the replace all button until no more hits

but for less than 20 chapter headers, I could probably manually edit them faster than I would write & debug all of the above code
cybmole is offline   Reply With Quote
Old 02-05-2014, 01:21 PM   #13
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,689
Karma: 54369090
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by cybmole View Post
ok , then the code to capitalise 1st letter after the chosen tag close is easy, & next: within your chosen tags you then find any instance of space followed by a lower case letter & flip that to upper case.with that last one, rather than try to code a iterative loop, I'd just spam the replace all button until no more hits

but for less than 20 chapter headers, I could probably manually edit them faster than I would write & debug all of the above code
Amen

Search
Look (and maybe replace or edit)
Search ...


If it can't be done with a Quick and Dirty AND it will not be a bit of Reusable Coding, it is not normally wort the effort (Practicing REGEX foo would be the exception, but then I would not expect the question here)
theducks is offline   Reply With Quote
Old 02-05-2014, 03:38 PM   #14
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,071
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
@all -- got some real good ideas

I added a little CSS (this is only a fragment from my test book

Code:
h1,h2,h3,h4,h5 {
	text-transform:capitalize;
}
This works if the text is lower case, so I used the F&R

Find<h[1-9]>)(.*)\>

Replace: \1\L\2>

The 'tweaking' is making acronyms all upper but that is a lot less effort

Still very open to ideas and suggestion
phossler is offline   Reply With Quote
Old 02-05-2014, 06:55 PM   #15
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by phossler View Post
@Tex2002ans -- thanks, I tried something similar, but the document had a lot of all caps acronyms that were being found. I tried the 'Replace/Find Next' but that was taking a lot of time
Ahhh... what I do is run this regex as one of my "final" cleanup passes.

I have a toc.ncx already generated, and/or I have preview open up on the left-half of my screen. I quickly jump to where I need to be using TOC/Preview, and then Find/Replace what I need in Code View.

Click image for larger version

Name:	ALLCAPStoTitleCase.png
Views:	277
Size:	156.1 KB
ID:	118756

I don't use this Find/Replace throughout the entire book (unless this particular book has some odd ALL CAPS floating around it that need to be checked/fixed).

It isn't the quickest/most automated way to do it, but while I am fixing this I am also quickly just scanning around/taking a look to see if I can catch any other typos/errors.

Quote:
Originally Posted by DiapDealer View Post
@Tex2002ans:
The \b word boundary is handy, but it can also result in some confusing behavior if you run into any typographic quotes, apostrophes, or other unicode characters contained in the text.
Thanks for the info. Which is why I mentioned where my Regex breaks in my first post. Some of these Regex can get downright hairy, and I prefer something that is still understandable by a normal human!

Last edited by Tex2002ans; 02-05-2014 at 07:19 PM.
Tex2002ans is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Is there RegEx to <span> ALL CAPS text? phossler Sigil 4 03-10-2013 02:43 PM
Help with regex expression for words in all caps bfollowell Sigil 9 01-20-2012 05:11 PM
small caps yuxi_kelly ePub 20 06-05-2011 12:04 AM
Historical Item Question: Candle Caps? Steven Lake Writers' Corner 3 03-19-2011 08:13 AM
Unutterably Silly ANGRY CAPS Not_A_Crook Lounge 56 12-10-2009 01:16 AM


All times are GMT -4. The time now is 06:00 AM.


MobileRead.com is a privately owned, operated and funded community.