06-10-2012, 02:18 PM | #1 |
Addict
Posts: 320
Karma: 56788
Join Date: Jun 2011
Device: Kindle
|
Change uppercase to mixed case
I have a bunch of names in all caps:
Code:
JOHN DOE PEA TEAR GRIFFON AARON A. AARONSON Code:
John Doe Pea Tear Griffon Aaron A. Aaronson Any ideas? |
06-10-2012, 02:29 PM | #2 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
Search for: ([A-Z])([A-Z]{1,})
Replace by: \1\L\2 |
06-10-2012, 03:16 PM | #3 |
Grand Sorcerer
Posts: 27,605
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Find: (?<=\b\p{Lu})([^\s]+)(?=[^<>]*</h2>)
Replace: \L\1\E Will pretty much title case anything between <h2></h2>. It probably won't work if you have embedded <span>, <b>, <i>, or <br /> stuff between the <h2> tags, but other than that, it's fairly fool-proof. EDIT: Fool-proof for stuff that's ALL CAPS to begin with, that is. Last edited by DiapDealer; 06-10-2012 at 05:42 PM. |
06-10-2012, 05:16 PM | #4 |
Berti
Posts: 1,197
Karma: 4985964
Join Date: Jan 2012
Location: Zischebattem
Device: Acer Lumiread
|
|
06-10-2012, 05:30 PM | #5 |
Addict
Posts: 320
Karma: 56788
Join Date: Jun 2011
Device: Kindle
|
Thank you both. I'm curious, is \p{L} (and its variants) now considered the preferred syntax for [A-Za-z] (and its variants)?
|
06-10-2012, 05:34 PM | #6 | |
Addict
Posts: 320
Karma: 56788
Join Date: Jun 2011
Device: Kindle
|
Quote:
|
|
06-10-2012, 05:40 PM | #7 |
Grand Sorcerer
Posts: 27,605
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
\p{L} or \p{Letter} or \pL is any unicode letter character. It will match things like Á or ö (as well as the "normal" [A-Za-z]).
\p{Lu} or \p{Uppercase_Letter} is an uppercase unicode letter character. \p{Ll} or \p{Lowercase_Letter} is a lowercase unicode letter character http://www.regular-expressions.info/unicode.html I use \p{L) and its variants simply because I got tired of screwing up books that contained unicode characters where I least expected them. It's just a personal preference of mine to try and think in terms of unicode as much as possible. I believe you can also prefix your regex expressions with (*UCP) to make them "unicode aware." |
06-10-2012, 05:46 PM | #8 |
Berti
Posts: 1,197
Karma: 4985964
Join Date: Jan 2012
Location: Zischebattem
Device: Acer Lumiread
|
|
06-10-2012, 06:44 PM | #9 |
Grand Sorcerer
Posts: 5,607
Karma: 23185369
Join Date: Dec 2010
Device: Kindle PW2
|
I tested the above RE with both the latest Sigil Version and the Beta and it didn't find:
Code:
<h2>AARON A. AARONSON</h2> |
06-10-2012, 07:06 PM | #10 |
Grand Sorcerer
Posts: 27,605
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I've not tried the beta version, but yes... I use \p{L} and it's variants all the time in v0.5.3. The PCRE regex engine definitely supports it.
And I have no idea what might be different, but: Code:
(?<=\b\p{Lu})([^\s]+)(?=[^<>]*</h2>) Code:
<h2>AARON A. AARONSON</h2> Code:
\L\1\E Code:
<h2>Aaron A. Aaronson</h2> EDIT: Oh, and it doesn't match the WHOLE of <h2>AARON A. AARONSON</h2>, but rather it matches each capitalized word one at a time. EDIT2: you are correct that it doesn't work in the beta--at least it doesn't with my built-from-source-on-ubuntu-64-bit version. I hope that's a simple fix (or already a known bug). Last edited by DiapDealer; 06-10-2012 at 07:23 PM. |
06-11-2012, 03:54 AM | #11 | |
Grand Sorcerer
Posts: 5,607
Karma: 23185369
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Hopefully, this will be fixed in the next official Windows release. |
|
06-11-2012, 06:40 AM | #12 | |
Grand Sorcerer
Posts: 27,605
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
|
|
06-11-2012, 07:12 AM | #13 |
Grand Sorcerer
Posts: 5,607
Karma: 23185369
Join Date: Dec 2010
Device: Kindle PW2
|
I guess there must have been something wrong with my Qt or GTK versions or another system library file. I just installed the latest version of GIMP and for some odd reason your regular expression works now. (It still doesn't work in the beta, though.)
|
06-12-2012, 06:45 AM | #14 |
Addict
Posts: 320
Karma: 56788
Join Date: Jun 2011
Device: Kindle
|
I'm back for more sweet, sweet Sigil forum goodness...
Is there a way to apply mixed case to the entire string between the <h2> tags without cycling through each individual word and without clicking a "replace all". I'm wondering if anyone is familiar with the algorithm the calibre uses to modify titles to Title Case in its Bulk Edit Metadata dialogue, and whether it would be applicable in this circumstance... |
06-12-2012, 07:26 AM | #15 | ||
Grand Sorcerer
Posts: 27,605
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
Quote:
|
||
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Touch Uppercase!! | snirp | Kobo Reader | 4 | 02-12-2012 10:08 PM |
Folder case change | skells | Calibre | 3 | 05-17-2011 02:24 PM |
Content Auto renaming authors to mixed case | Fangles | Amazon Kindle | 4 | 04-14-2011 10:17 PM |
PRS-600 Mixed feelings | ziegl027 | Sony Reader | 6 | 04-13-2010 03:47 PM |
iLiad All uppercase filename not accepted in vfat | ericshliao | iRex Developer's Corner | 0 | 12-05-2008 05:04 PM |