Yet another regex question

Jabby · 01-24-2012, 03:50 PM

I want to find all instances of followed by a lower case character. Testing just the first character.

Thanks - John

theducks · 01-24-2012, 04:02 PM

Quote:

Originally Posted by Jabby

I want to find all instances of followed by a lower case character. Testing just the first character.

Thanks - John

Code:

</p>\s+<p.+>([a-z])

Because you are probably trying to un-wrap

Replace:

Code:

\1

<<leading space

Serpentine · 01-24-2012, 05:27 PM

Might be a bit overkill:

If you want to find paragraphs which might be incorrectly split, here's what I've come up with - it needs a little tweak sometimes, but generally rather good. I wouldn't recommend replacing everything, unless you grep first for results (think I have an alternative with span/[bsiu]'s ignored somewhere... mmm).

Code:

(?smi)(?<=[^[:punct:]])</p>\s*<p[^<>]*>(?=[\.-?])|</p>\s*<p[^<>]*>(?!\s*(<[sbui]>|[[:punct:]\s])+[[:upper:]])(?=[[:punct:]\s]+[[:lower:]])|</p>\s*<p[^<>]*>((?=[ \.>]{2,}([[:punct:]]|[[:lower:]]))|(?=[[:lower:]]))|(?<=,)</p>\s*<p[^<>]*>

Replace with a space character, else it will join the end words.

Jabby · 01-24-2012, 05:36 PM

Thanks ducks,

I still don't know what did but I ended up with a space, in the middle of a sentence, being replaced by in a couple of dozen places in my document.

Anyway...... This is what did it.

Code:

</p>\s+<p>([a-z])

How it knew to stop at one character, I don't know. Regex is an acquaintance and not a friend. Maybe one of these days?

Regards = John

Serpentine · 01-24-2012, 05:45 PM

Code:

(?s)</p>\s*<p\b[^<>]*>(?=[[:lower:]])

Replace with a space character, might be slightly better if you're trying to merge paragraphs - or just find them.

theducks · 01-24-2012, 06:24 PM

Quote:

Originally Posted by Jabby

Thanks ducks,

I still don't know what did but I ended up with a space, in the middle of a sentence, being replaced by in a couple of dozen places in my document.

Anyway...... This is what did it.

Code:

</p>\s+<p>([a-z])

How it knew to stop at one character, I don't know. Regex is an acquaintance and not a friend. Maybe one of these days?

Regards = John

[a-z] says match any Single character a thru z
[a-zI] says a thru z or I
It is all in the hyphen

cybmole · 01-25-2012, 02:39 AM

Quote:

Originally Posted by Serpentine

Might be a bit overkill:

If you want to find paragraphs which might be incorrectly split, here's what I've come up with - it needs a little tweak sometimes, but generally rather good. I wouldn't recommend replacing everything, unless you grep first for results (think I have an alternative with span/[bsiu]'s ignored somewhere... mmm).

Code:

(?smi)(?<=[^[:punct:]])</p>\s*<p[^<>]*>(?=[\.-?])|</p>\s*<p[^<>]*>(?!\s*(<[sbui]>|[[:punct:]\s])+[[:upper:]])(?=[[:punct:]\s]+[[:lower:]])|</p>\s*<p[^<>]*>((?=[ \.>]{2,}([[:punct:]]|[[:lower:]]))|(?=[[:lower:]]))|(?<=,)</p>\s*<p[^<>]*>

Replace with a space character, else it will join the end words.

any chance of a breakdown / analysis of what that long line is doing please ?

crutledge · 01-30-2012, 09:11 AM

Quote:

Originally Posted by Serpentine

Might be a bit overkill:

If you want to find paragraphs which might be incorrectly split, here's what I've come up with - it needs a little tweak sometimes, but generally rather good. I wouldn't recommend replacing everything, unless you grep first for results (think I have an alternative with span/[bsiu]'s ignored somewhere... mmm).

Code:

(?smi)(?<=[^[:punct:]])</p>\s*<p[^<>]*>(?=[\.-?])|</p>\s*<p[^<>]*>(?!\s*(<[sbui]>|[[:punct:]\s])+[[:upper:]])(?=[[:punct:]\s]+[[:lower:]])|</p>\s*<p[^<>]*>((?=[ \.>]{2,}([[:punct:]]|[[:lower:]]))|(?=[[:lower:]]))|(?<=,)</p>\s*<p[^<>]*>

Replace with a space character, else it will join the end words.

You, sir, have a strange and devious mind.

It works great!

signum · 01-30-2012, 08:41 PM

Quote:

Originally Posted by Jabby

I want to find all instances of followed by a lower case character. Testing just the first character.

Thanks - John

A simple, literal answer is

Code:

<p>[a-z]

Make sure you are in Code View and the search options Match Case and Minimal Matching are checked and that the search mode Wildcard is checked. The square brackets mean any single character in that range, i.e., a-z. Works for me and I use it a lot. You should probably also check for paragraphs ending in a lower case letter.

Code:

[a-z]</p>

01-24-2012, 03:50 PM	#1
Jabby Jr. - Junior Member Posts: 586 Karma: 2000358 Join Date: Aug 2010 Location: Alabama Device: Archos, Asus, HP, Lenovo, Nexus and Samsung tablets in 7,8 and 10"	Yet another regex question I want to find all instances of <p> followed by a lower case character. Testing just the first character. Thanks - John

01-24-2012, 05:27 PM	#3
Serpentine Evangelist Posts: 416 Karma: 1045911 Join Date: Sep 2011 Location: Cape Town, South Africa Device: Kindle 3	Might be a bit overkill: If you want to find paragraphs which might be incorrectly split, here's what I've come up with - it needs a little tweak sometimes, but generally rather good. I wouldn't recommend replacing everything, unless you grep first for results (think I have an alternative with span/[bsiu]'s ignored somewhere... mmm). Code: (?smi)(?<=[^[:punct:]])</p>\s<p[^<>]>(?=[\.-?])\|</p>\s<p[^<>]>(?!\s(<[sbui]>\|[[:punct:]\s])+[[:upper:]])(?=[[:punct:]\s]+[[:lower:]])\|</p>\s<p[^<>]>((?=[ \.>]{2,}([[:punct:]]\|[[:lower:]]))\|(?=[[:lower:]]))\|(?<=,)</p>\s<p[^<>]*> Replace with a space character, else it will join the end words.

01-24-2012, 05:36 PM	#4
Jabby Jr. - Junior Member Posts: 586 Karma: 2000358 Join Date: Aug 2010 Location: Alabama Device: Archos, Asus, HP, Lenovo, Nexus and Samsung tablets in 7,8 and 10"	Thanks ducks, I still don't know what did but I ended up with a space, in the middle of a sentence, being replaced by </p><p> in a couple of dozen places in my document. Anyway...... This is what did it. Code: </p>\s+<p>([a-z]) How it knew to stop at one character, I don't know. Regex is an acquaintance and not a friend. Maybe one of these days? Regards = John

01-24-2012, 05:45 PM	#5
Serpentine Evangelist Posts: 416 Karma: 1045911 Join Date: Sep 2011 Location: Cape Town, South Africa Device: Kindle 3	Code: (?s)</p>\s<p\b[^<>]>(?=[[:lower:]]) Replace with a space character, might be slightly better if you're trying to merge paragraphs - or just find them. Last edited by Serpentine; 01-24-2012 at 05:48 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Newbie question - Hardcode values on RegEx on import	PeterSm	Library Management	1	10-04-2011 10:55 AM
Regex Question involving multiple . (periods)	hanbalfrek	Conversion	11	08-29-2011 05:06 PM
Regex question and maybe some help	crutledge	Sigil	9	03-10-2011 04:37 PM
Regex Question	Archon	Conversion	11	02-05-2011 10:13 AM
Import files, regex question	al35	Calibre	0	03-22-2010 12:33 PM

Advert

Advert