Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 06-10-2012, 02:18 PM   #1
ElMiko
Addict
ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.
 
ElMiko's Avatar
 
Posts: 320
Karma: 56788
Join Date: Jun 2011
Device: Kindle
Change uppercase to mixed case

I have a bunch of names in all caps:
Code:
JOHN DOE
PEA TEAR GRIFFON
AARON A. AARONSON
What I'm trying to do is change those to mixed case:
Code:
John Doe
Pea Tear Griffon
Aaron A. Aaronson
these are all individually nested in between <h2> tags, so it's easy enough to identify them (<h2>(.*?)</h2>), the problem is in make the mixed case change.

Any ideas?
ElMiko is offline   Reply With Quote
Old 06-10-2012, 02:29 PM   #2
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Search for: ([A-Z])([A-Z]{1,})
Replace by: \1\L\2
Toxaris is offline   Reply With Quote
Advert
Old 06-10-2012, 03:16 PM   #3
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,465
Karma: 192992430
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Find: (?<=\b\p{Lu})([^\s]+)(?=[^<>]*</h2>)
Replace: \L\1\E

Will pretty much title case anything between <h2></h2>.
It probably won't work if you have embedded <span>, <b>, <i>, or <br /> stuff between the <h2> tags, but other than that, it's fairly fool-proof.

EDIT: Fool-proof for stuff that's ALL CAPS to begin with, that is.

Last edited by DiapDealer; 06-10-2012 at 05:42 PM.
DiapDealer is offline   Reply With Quote
Old 06-10-2012, 05:16 PM   #4
mmat1
Berti
mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.
 
mmat1's Avatar
 
Posts: 1,196
Karma: 4985964
Join Date: Jan 2012
Location: Zischebattem
Device: Acer Lumiread
Quote:
Originally Posted by DiapDealer View Post
Find: (?<=\b\p{Lu})([^\s]+)(?=[^<>]*</h2>)
Replace: \L\1\E
Would you please explain:
\p
{Lu}

I looked in my books and in the Python-doc, but i didn't find those two parts of the statement.
mmat1 is offline   Reply With Quote
Old 06-10-2012, 05:30 PM   #5
ElMiko
Addict
ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.
 
ElMiko's Avatar
 
Posts: 320
Karma: 56788
Join Date: Jun 2011
Device: Kindle
Thank you both. I'm curious, is \p{L} (and its variants) now considered the preferred syntax for [A-Za-z] (and its variants)?
ElMiko is offline   Reply With Quote
Advert
Old 06-10-2012, 05:34 PM   #6
ElMiko
Addict
ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.
 
ElMiko's Avatar
 
Posts: 320
Karma: 56788
Join Date: Jun 2011
Device: Kindle
Quote:
Originally Posted by mmat1 View Post
Would you please explain:
\p
{Lu}

I looked in my books and in the Python-doc, but i didn't find those two parts of the statement.
sorry for the double post, but from what I was able to gather from here, the \p{} expression matches all unicode characters of a particular class. In this case, the {Lu} is for matching lowercase letters.
ElMiko is offline   Reply With Quote
Old 06-10-2012, 05:40 PM   #7
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,465
Karma: 192992430
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
\p{L} or \p{Letter} or \pL is any unicode letter character. It will match things like Á or ö (as well as the "normal" [A-Za-z]).

\p{Lu} or \p{Uppercase_Letter} is an uppercase unicode letter character.
\p{Ll} or \p{Lowercase_Letter} is a lowercase unicode letter character

http://www.regular-expressions.info/unicode.html

I use \p{L) and its variants simply because I got tired of screwing up books that contained unicode characters where I least expected them. It's just a personal preference of mine to try and think in terms of unicode as much as possible.

I believe you can also prefix your regex expressions with (*UCP) to make them "unicode aware."
DiapDealer is offline   Reply With Quote
Old 06-10-2012, 05:46 PM   #8
mmat1
Berti
mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.
 
mmat1's Avatar
 
Posts: 1,196
Karma: 4985964
Join Date: Jan 2012
Location: Zischebattem
Device: Acer Lumiread
Quote:
Originally Posted by DiapDealer View Post
\p{L} or \p{Letter} or \pL is any unicode letter character. It will match things like Á or ö (as well as the "normal" [A-Za-z]).
Thanks for the info and the links to you and ElMiko
mmat1 is offline   Reply With Quote
Old 06-10-2012, 06:44 PM   #9
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by DiapDealer View Post
Find: (?<=\b\p{Lu})([^\s]+)(?=[^<>]*</h2>)
Replace: \L\1\E
I tested the above RE with both the latest Sigil Version and the Beta and it didn't find:

Code:
<h2>AARON A. AARONSON</h2>
Far be it from me to doubt your expertise in this matter, but are you 100% sure that the Sigil PCRE engine supports the \p(Lu) metacharacter? According to PCRE man pages, support for \P, \p, and \X requires a special compile option and \X doesn't seem to work either.
Doitsu is offline   Reply With Quote
Old 06-10-2012, 07:06 PM   #10
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,465
Karma: 192992430
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
I've not tried the beta version, but yes... I use \p{L} and it's variants all the time in v0.5.3. The PCRE regex engine definitely supports it.

And I have no idea what might be different, but:
Code:
(?<=\b\p{Lu})([^\s]+)(?=[^<>]*</h2>)
finds:
Code:
<h2>AARON A. AARONSON</h2>
just fine in my installation of 0.5.3 and:
Code:
\L\1\E
and clicking "Replace All" (or stepping through one word at a time) changes it to:
Code:
<h2>Aaron A. Aaronson</h2>
It worked for all three test cases presented. I just double-checked again, so I'm not sure why it wouldn't work for you.

EDIT: Oh, and it doesn't match the WHOLE of <h2>AARON A. AARONSON</h2>, but rather it matches each capitalized word one at a time.
EDIT2: you are correct that it doesn't work in the beta--at least it doesn't with my built-from-source-on-ubuntu-64-bit version. I hope that's a simple fix (or already a known bug).

Last edited by DiapDealer; 06-10-2012 at 07:23 PM.
DiapDealer is offline   Reply With Quote
Old 06-11-2012, 03:54 AM   #11
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by DiapDealer View Post
It worked for all three test cases presented. I just double-checked again, so I'm not sure why it wouldn't work for you.
I just tested your regular expression with the current Linux version and it works just fine. Since I'm using an x32 Windows machine, most likely the optional PCRE Unicode library wasn't included in the x32 Windows builds.
Hopefully, this will be fixed in the next official Windows release.
Doitsu is offline   Reply With Quote
Old 06-11-2012, 06:40 AM   #12
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,465
Karma: 192992430
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Doitsu View Post
I just tested your regular expression with the current Linux version and it works just fine. Since I'm using an x32 Windows machine, most likely the optional PCRE Unicode library wasn't included in the x32 Windows builds.
Hopefully, this will be fixed in the next official Windows release.
That's very likely in the case of the beta version, but just so we're on the same page... that regex works for me on the WIndows (32-bit) current version (0.5.3).
DiapDealer is offline   Reply With Quote
Old 06-11-2012, 07:12 AM   #13
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by DiapDealer View Post
That's very likely in the case of the beta version, but just so we're on the same page... that regex works for me on the WIndows (32-bit) current version (0.5.3).
I guess there must have been something wrong with my Qt or GTK versions or another system library file. I just installed the latest version of GIMP and for some odd reason your regular expression works now. (It still doesn't work in the beta, though.)
Doitsu is offline   Reply With Quote
Old 06-12-2012, 06:45 AM   #14
ElMiko
Addict
ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.
 
ElMiko's Avatar
 
Posts: 320
Karma: 56788
Join Date: Jun 2011
Device: Kindle
I'm back for more sweet, sweet Sigil forum goodness...

Is there a way to apply mixed case to the entire string between the <h2> tags without cycling through each individual word and without clicking a "replace all". I'm wondering if anyone is familiar with the algorithm the calibre uses to modify titles to Title Case in its Bulk Edit Metadata dialogue, and whether it would be applicable in this circumstance...
ElMiko is offline   Reply With Quote
Old 06-12-2012, 07:26 AM   #15
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,465
Karma: 192992430
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Is there a way to apply mixed case to the entire string between the <h2> tags without cycling through each individual word and without clicking a "replace all".
Oh, you mean like a magic rexep?

Quote:
I'm wondering if anyone is familiar with the algorithm the calibre uses to modify titles to Title Case in its Bulk Edit Metadata dialogue, and whether it would be applicable in this circumstance...
I'm almost certain it works on a word-by-word basis too. It just goes on behind the scenes, so you don't realize it. You'll probably have to wait until Sigil has a third-party plugin system of some kind before the kind of feature you're envisioning would be feasible.
DiapDealer is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Touch Uppercase!! snirp Kobo Reader 4 02-12-2012 10:08 PM
Folder case change skells Calibre 3 05-17-2011 02:24 PM
Content Auto renaming authors to mixed case Fangles Amazon Kindle 4 04-14-2011 10:17 PM
PRS-600 Mixed feelings ziegl027 Sony Reader 6 04-13-2010 03:47 PM
iLiad All uppercase filename not accepted in vfat ericshliao iRex Developer's Corner 0 12-05-2008 05:04 PM


All times are GMT -4. The time now is 09:49 PM.


MobileRead.com is a privately owned, operated and funded community.