Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 01-24-2012, 03:50 PM   #1
Jabby
Jr. - Junior Member
Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.
 
Posts: 586
Karma: 2000358
Join Date: Aug 2010
Location: Alabama
Device: Archos, Asus, HP, Lenovo, Nexus and Samsung tablets in 7,8 and 10"
Yet another regex question

I want to find all instances of <p> followed by a lower case character. Testing just the first character.

Thanks - John
Jabby is offline   Reply With Quote
Old 01-24-2012, 04:02 PM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,809
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Jabby View Post
I want to find all instances of <p> followed by a lower case character. Testing just the first character.

Thanks - John
Code:
</p>\s+<p.+>([a-z])
Because you are probably trying to un-wrap

Replace:
Code:
 \1
<<leading space
theducks is online now   Reply With Quote
Advert
Old 01-24-2012, 05:27 PM   #3
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
Might be a bit overkill:

If you want to find paragraphs which might be incorrectly split, here's what I've come up with - it needs a little tweak sometimes, but generally rather good. I wouldn't recommend replacing everything, unless you grep first for results (think I have an alternative with span/[bsiu]'s ignored somewhere... mmm).

Code:
(?smi)(?<=[^[:punct:]])</p>\s*<p[^<>]*>(?=[\.-?])|</p>\s*<p[^<>]*>(?!\s*(<[sbui]>|[[:punct:]\s])+[[:upper:]])(?=[[:punct:]\s]+[[:lower:]])|</p>\s*<p[^<>]*>((?=[ \.>]{2,}([[:punct:]]|[[:lower:]]))|(?=[[:lower:]]))|(?<=,)</p>\s*<p[^<>]*>
Replace with a space character, else it will join the end words.
Serpentine is offline   Reply With Quote
Old 01-24-2012, 05:36 PM   #4
Jabby
Jr. - Junior Member
Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.
 
Posts: 586
Karma: 2000358
Join Date: Aug 2010
Location: Alabama
Device: Archos, Asus, HP, Lenovo, Nexus and Samsung tablets in 7,8 and 10"
Thanks ducks,

I still don't know what did but I ended up with a space, in the middle of a sentence, being replaced by </p><p> in a couple of dozen places in my document.

Anyway...... This is what did it.
Code:
</p>\s+<p>([a-z])
How it knew to stop at one character, I don't know. Regex is an acquaintance and not a friend. Maybe one of these days?

Regards = John
Jabby is offline   Reply With Quote
Old 01-24-2012, 05:45 PM   #5
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
Code:
(?s)</p>\s*<p\b[^<>]*>(?=[[:lower:]])
Replace with a space character, might be slightly better if you're trying to merge paragraphs - or just find them.

Last edited by Serpentine; 01-24-2012 at 05:48 PM.
Serpentine is offline   Reply With Quote
Advert
Old 01-24-2012, 06:24 PM   #6
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,809
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Jabby View Post
Thanks ducks,

I still don't know what did but I ended up with a space, in the middle of a sentence, being replaced by </p><p> in a couple of dozen places in my document.

Anyway...... This is what did it.
Code:
</p>\s+<p>([a-z])
How it knew to stop at one character, I don't know. Regex is an acquaintance and not a friend. Maybe one of these days?

Regards = John
[a-z] says match any Single character a thru z
[a-zI] says a thru z or I
It is all in the hyphen
theducks is online now   Reply With Quote
Old 01-25-2012, 02:39 AM   #7
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by Serpentine View Post
Might be a bit overkill:

If you want to find paragraphs which might be incorrectly split, here's what I've come up with - it needs a little tweak sometimes, but generally rather good. I wouldn't recommend replacing everything, unless you grep first for results (think I have an alternative with span/[bsiu]'s ignored somewhere... mmm).

Code:
(?smi)(?<=[^[:punct:]])</p>\s*<p[^<>]*>(?=[\.-?])|</p>\s*<p[^<>]*>(?!\s*(<[sbui]>|[[:punct:]\s])+[[:upper:]])(?=[[:punct:]\s]+[[:lower:]])|</p>\s*<p[^<>]*>((?=[ \.>]{2,}([[:punct:]]|[[:lower:]]))|(?=[[:lower:]]))|(?<=,)</p>\s*<p[^<>]*>
Replace with a space character, else it will join the end words.
any chance of a breakdown / analysis of what that long line is doing please ?
cybmole is offline   Reply With Quote
Old 01-30-2012, 09:11 AM   #8
crutledge
eBook FANatic
crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.
 
crutledge's Avatar
 
Posts: 18,301
Karma: 16071131
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
Quote:
Originally Posted by Serpentine View Post
Might be a bit overkill:

If you want to find paragraphs which might be incorrectly split, here's what I've come up with - it needs a little tweak sometimes, but generally rather good. I wouldn't recommend replacing everything, unless you grep first for results (think I have an alternative with span/[bsiu]'s ignored somewhere... mmm).

Code:
(?smi)(?<=[^[:punct:]])</p>\s*<p[^<>]*>(?=[\.-?])|</p>\s*<p[^<>]*>(?!\s*(<[sbui]>|[[:punct:]\s])+[[:upper:]])(?=[[:punct:]\s]+[[:lower:]])|</p>\s*<p[^<>]*>((?=[ \.>]{2,}([[:punct:]]|[[:lower:]]))|(?=[[:lower:]]))|(?<=,)</p>\s*<p[^<>]*>
Replace with a space character, else it will join the end words.
You, sir, have a strange and devious mind.

It works great!
crutledge is offline   Reply With Quote
Old 01-30-2012, 08:41 PM   #9
signum
Zealot
signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.
 
Posts: 119
Karma: 64428
Join Date: Aug 2011
Device: none
Quote:
Originally Posted by Jabby View Post
I want to find all instances of <p> followed by a lower case character. Testing just the first character.

Thanks - John
A simple, literal answer is

Code:
<p>[a-z]
Make sure you are in Code View and the search options Match Case and Minimal Matching are checked and that the search mode Wildcard is checked. The square brackets mean any single character in that range, i.e., a-z. Works for me and I use it a lot. You should probably also check for paragraphs ending in a lower case letter.

Code:
[a-z]</p>
signum is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Newbie question - Hardcode values on RegEx on import PeterSm Library Management 1 10-04-2011 10:55 AM
Regex Question involving multiple . (periods) hanbalfrek Conversion 11 08-29-2011 05:06 PM
Regex question and maybe some help crutledge Sigil 9 03-10-2011 04:37 PM
Regex Question Archon Conversion 11 02-05-2011 10:13 AM
Import files, regex question al35 Calibre 0 03-22-2010 12:33 PM


All times are GMT -4. The time now is 06:57 PM.


MobileRead.com is a privately owned, operated and funded community.