Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 01-24-2012, 04:50 PM   #1
Jabby
Jr. - Junior Member
Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.
 
Posts: 575
Karma: 2000358
Join Date: Aug 2010
Location: East Texas
Device: Archos, Asus, HP, Lenovo, Nexus and Samsung tablets in 7,8 and 10"
Yet another regex question

I want to find all instances of <p> followed by a lower case character. Testing just the first character.

Thanks - John
Jabby is offline   Reply With Quote
Old 01-24-2012, 05:02 PM   #2
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,039
Karma: 5936659
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by Jabby View Post
I want to find all instances of <p> followed by a lower case character. Testing just the first character.

Thanks - John
Code:
</p>\s+<p.+>([a-z])
Because you are probably trying to un-wrap

Replace:
Code:
 \1
<<leading space
theducks is online now   Reply With Quote
 
Advertisement
Old 01-24-2012, 06:27 PM   #3
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
Might be a bit overkill:

If you want to find paragraphs which might be incorrectly split, here's what I've come up with - it needs a little tweak sometimes, but generally rather good. I wouldn't recommend replacing everything, unless you grep first for results (think I have an alternative with span/[bsiu]'s ignored somewhere... mmm).

Code:
(?smi)(?<=[^[:punct:]])</p>\s*<p[^<>]*>(?=[\.-?])|</p>\s*<p[^<>]*>(?!\s*(<[sbui]>|[[:punct:]\s])+[[:upper:]])(?=[[:punct:]\s]+[[:lower:]])|</p>\s*<p[^<>]*>((?=[ \.>]{2,}([[:punct:]]|[[:lower:]]))|(?=[[:lower:]]))|(?<=,)</p>\s*<p[^<>]*>
Replace with a space character, else it will join the end words.
Serpentine is offline   Reply With Quote
Old 01-24-2012, 06:36 PM   #4
Jabby
Jr. - Junior Member
Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.
 
Posts: 575
Karma: 2000358
Join Date: Aug 2010
Location: East Texas
Device: Archos, Asus, HP, Lenovo, Nexus and Samsung tablets in 7,8 and 10"
Thanks ducks,

I still don't know what did but I ended up with a space, in the middle of a sentence, being replaced by </p><p> in a couple of dozen places in my document.

Anyway...... This is what did it.
Code:
</p>\s+<p>([a-z])
How it knew to stop at one character, I don't know. Regex is an acquaintance and not a friend. Maybe one of these days?

Regards = John
Jabby is offline   Reply With Quote
Old 01-24-2012, 06:45 PM   #5
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
Code:
(?s)</p>\s*<p\b[^<>]*>(?=[[:lower:]])
Replace with a space character, might be slightly better if you're trying to merge paragraphs - or just find them.

Last edited by Serpentine; 01-24-2012 at 06:48 PM.
Serpentine is offline   Reply With Quote
Old 01-24-2012, 07:24 PM   #6
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,039
Karma: 5936659
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by Jabby View Post
Thanks ducks,

I still don't know what did but I ended up with a space, in the middle of a sentence, being replaced by </p><p> in a couple of dozen places in my document.

Anyway...... This is what did it.
Code:
</p>\s+<p>([a-z])
How it knew to stop at one character, I don't know. Regex is an acquaintance and not a friend. Maybe one of these days?

Regards = John
[a-z] says match any Single character a thru z
[a-zI] says a thru z or I
It is all in the hyphen
theducks is online now   Reply With Quote
Old 01-25-2012, 03:39 AM   #7
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,959
Karma: 1280000
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Kindle Fire HDX 8.9 , Kindle for PC
Quote:
Originally Posted by Serpentine View Post
Might be a bit overkill:

If you want to find paragraphs which might be incorrectly split, here's what I've come up with - it needs a little tweak sometimes, but generally rather good. I wouldn't recommend replacing everything, unless you grep first for results (think I have an alternative with span/[bsiu]'s ignored somewhere... mmm).

Code:
(?smi)(?<=[^[:punct:]])</p>\s*<p[^<>]*>(?=[\.-?])|</p>\s*<p[^<>]*>(?!\s*(<[sbui]>|[[:punct:]\s])+[[:upper:]])(?=[[:punct:]\s]+[[:lower:]])|</p>\s*<p[^<>]*>((?=[ \.>]{2,}([[:punct:]]|[[:lower:]]))|(?=[[:lower:]]))|(?<=,)</p>\s*<p[^<>]*>
Replace with a space character, else it will join the end words.
any chance of a breakdown / analysis of what that long line is doing please ?
cybmole is offline   Reply With Quote
Old 01-30-2012, 10:11 AM   #8
crutledge
eBook FANatic
crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.
 
crutledge's Avatar
 
Posts: 15,681
Karma: 13575467
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
Quote:
Originally Posted by Serpentine View Post
Might be a bit overkill:

If you want to find paragraphs which might be incorrectly split, here's what I've come up with - it needs a little tweak sometimes, but generally rather good. I wouldn't recommend replacing everything, unless you grep first for results (think I have an alternative with span/[bsiu]'s ignored somewhere... mmm).

Code:
(?smi)(?<=[^[:punct:]])</p>\s*<p[^<>]*>(?=[\.-?])|</p>\s*<p[^<>]*>(?!\s*(<[sbui]>|[[:punct:]\s])+[[:upper:]])(?=[[:punct:]\s]+[[:lower:]])|</p>\s*<p[^<>]*>((?=[ \.>]{2,}([[:punct:]]|[[:lower:]]))|(?=[[:lower:]]))|(?<=,)</p>\s*<p[^<>]*>
Replace with a space character, else it will join the end words.
You, sir, have a strange and devious mind.

It works great!
crutledge is offline   Reply With Quote
Old 01-30-2012, 09:41 PM   #9
signum
Connoisseur
signum is that somebody.signum is that somebody.signum is that somebody.signum is that somebody.signum is that somebody.signum is that somebody.signum is that somebody.signum is that somebody.signum is that somebody.signum is that somebody.signum is that somebody.
 
Posts: 63
Karma: 45332
Join Date: Aug 2011
Device: none
Quote:
Originally Posted by Jabby View Post
I want to find all instances of <p> followed by a lower case character. Testing just the first character.

Thanks - John
A simple, literal answer is

Code:
<p>[a-z]
Make sure you are in Code View and the search options Match Case and Minimal Matching are checked and that the search mode Wildcard is checked. The square brackets mean any single character in that range, i.e., a-z. Works for me and I use it a lot. You should probably also check for paragraphs ending in a lower case letter.

Code:
[a-z]</p>
signum is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Newbie question - Hardcode values on RegEx on import PeterSm Library Management 1 10-04-2011 11:55 AM
Regex Question involving multiple . (periods) hanbalfrek Conversion 11 08-29-2011 06:06 PM
Regex question and maybe some help crutledge Sigil 9 03-10-2011 05:37 PM
Regex Question Archon Conversion 11 02-05-2011 11:13 AM
Import files, regex question al35 Calibre 0 03-22-2010 01:33 PM


All times are GMT -4. The time now is 12:39 PM.


MobileRead.com is a privately owned, operated and funded community.