Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 02-23-2012, 02:57 PM   #1
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
another regex puzzle - detect capitalised phrases

I suspect there is no easy answer to this but I will ask anyway.

given a book which uses capitalisation in lieu of scene breaks, with all paragraphs sharing the same CSS i.e.

THIS IS HOW THE 1st paragraphs starts.......blah blah
but not the next paragraph...
Or the one after that......
....
YET SOME TIME LATER THERE is another instance
...

I want to pick out those capitalised starts in order to assign a unique CSS class.

but devising a rule is very hard.

testing that 2nd letter of a paragraph is capitalised works most times but will miss
I CANNOT GET THIS one... and will miss A TOUGH ACT TO follow
and will mis-classify
"I don't want this one"

any better methods, anyone ?
cybmole is offline   Reply With Quote
Old 02-23-2012, 03:02 PM   #2
WS64
WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.
 
WS64's Avatar
 
Posts: 661
Karma: 506380
Join Date: Aug 2010
Location: Germany
Device: Kobo Aura / PB Lux 2 / Bookeen Frontlight / Kobo Mini / Nook Color
<p>[^a-z]{4,}

Will find any paragraph with the first 4 being no lower ones.
Of course this still will find <p>U.S.A. is the country...
I guess you need to check if the 4 is enough.
WS64 is offline   Reply With Quote
Advert
Old 02-23-2012, 03:04 PM   #3
mmat1
Berti
mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.
 
mmat1's Avatar
 
Posts: 1,197
Karma: 4985964
Join Date: Jan 2012
Location: Zischebattem
Device: Acer Lumiread
Quote:
Originally Posted by cybmole View Post
Phrase 1: THIS IS HOW THE
Phrase 2: A TOUGH ACT TO follow
Code:
<p>([A-Z])([A-Z|\s[A-Z])
match both.

Actually
Code:
<p>([A-Z])([\sA-Z])
will work as well

Last edited by mmat1; 02-23-2012 at 03:13 PM.
mmat1 is offline   Reply With Quote
Old 02-23-2012, 03:04 PM   #4
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by cybmole View Post
I suspect there is no easy answer to this but I will ask anyway.

given a book which uses capitalisation in lieu of scene breaks, with all paragraphs sharing the same CSS i.e.

THIS IS HOW THE 1st paragraphs starts.......blah blah
but not the next paragraph...
Or the one after that......
....
YET SOME TIME LATER THERE is another instance
...

I want to pick out those capitalised starts in order to assign a unique CSS class.

but devising a rule is very hard.

testing that 2nd letter of a paragraph is capitalised works most times but will miss
I CANNOT GET THIS one... and will miss A TOUGH ACT TO follow
and will mis-classify
"I don't want this one"

any better methods, anyone ?
Code:
([A-Z]* ){2,}
Case sensitive
find 1 or more Upper followed by a space, 2 or more times
theducks is offline   Reply With Quote
Old 02-24-2012, 02:15 AM   #5
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
thanks for all the different solutions - you guys make it look so easy !
cybmole is offline   Reply With Quote
Advert
Old 02-24-2012, 04:18 AM   #6
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
Quote:
Originally Posted by theducks View Post
Code:
([A-Z]* ){2,}
Case sensitive
find 1 or more Upper followed by a space, 2 or more times
Shouldn't that be a + instead of the *, as * is 0 or more times, which would match paragraphs that have several spaces at beginning, which are quite often (badly) used for indents.
Perkin is offline   Reply With Quote
Old 02-24-2012, 09:04 AM   #7
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Perkin View Post
Shouldn't that be a + instead of the *, as * is 0 or more times, which would match paragraphs that have several spaces at beginning, which are quite often (badly) used for indents.

Yes
(I so rarely use *, so why would I type it here )
theducks is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
regex puzzle: finding paragraph before... cybmole Sigil 8 02-24-2012 09:06 AM
Common words/phrases too aggressively italicized. carnivore Conversion 2 02-11-2011 06:36 PM
Exact phrases search? Any readers with this feature? Synergi Which one should I buy? 4 12-21-2010 12:09 PM
What do need to detect a Kindle 2? TallMomof2 Calibre 3 02-24-2009 05:00 PM
Podzinger -- Searches for phrases within podcasts Bob Russell Lounge 2 01-16-2006 04:36 PM


All times are GMT -4. The time now is 12:26 AM.


MobileRead.com is a privately owned, operated and funded community.