03-09-2011, 07:53 AM | #1 |
eBook FANatic
Posts: 18,301
Karma: 16071131
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
Regex question and maybe some help
I have built a regex to find paragraphs not ended by a normal terminal (eg. . ! etc) which may be a broken paragraph.
Code:
[^".;!:']</p> It allows me to look at each find and determine if it is truly a broken paragraph. How do I get this thing to ignore the (")? |
03-09-2011, 09:37 AM | #2 |
frumious Bandersnatch
Posts: 7,533
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Depending on the regexp dialect, this might work:
Code:
[^.:!?"']["')]?</p> Last edited by Jellby; 03-10-2011 at 04:34 AM. |
Advert | |
|
03-09-2011, 11:03 AM | #3 |
Connoisseur
Posts: 61
Karma: 12096
Join Date: Sep 2010
Location: Tasmania
Device: Sony PRS 650
|
Could the problem be that you need to specify both straight quotes and curly ending quotes? If so you'd need to copy and paste the required type of quote into the expression.
|
03-09-2011, 04:53 PM | #4 |
eBook FANatic
Posts: 18,301
Karma: 16071131
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
|
03-10-2011, 07:15 AM | #5 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
every approach has exceptions. my normal fix fails on things like
T.V. show or L.A. district where the full stop is not actually a sentence end, but has been followed by a line feed. to be thorough takes 2 passes. 1 to check for sentence ends , then another to check for paras which begin with lower case letter. |
Advert | |
|
03-10-2011, 08:13 AM | #6 |
Guru
Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
@cybmole, that still wouldn't get them all.
What about when it's something like Mr. Smith, that wouldn't get caught. |
03-10-2011, 08:28 AM | #7 | |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Quote:
what I do nowadays, because these things irritate me, is keep a pad n pencil handy when I'm reading a poor quality book on Kindle, & make a note of any formats or typos that I want to go back & fix, later. then I use sigil to find n fix in epub, then reconvert to mobi. the T.V. and L.A. examples were real cases. |
|
03-10-2011, 08:33 AM | #8 | ||
frumious Bandersnatch
Posts: 7,533
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
Whatever you do, only a careful (proof)reading (or several) of the book will catch some of them. Quote:
|
||
03-10-2011, 01:00 PM | #9 |
Connoisseur
Posts: 61
Karma: 12096
Join Date: Sep 2010
Location: Tasmania
Device: Sony PRS 650
|
This is why I like to do most of my reformatting in Word where a VBA macro takes care of all these possibilities. It lets me string together a series of Find-Replace operations and set Styles. In the case of 'Mr.', yes, I've had that occur so added a Find-Replace to correct it:
With Selection.Find .Text = "Mr.^p" .Replacement.Text = "Mr." |
03-10-2011, 04:37 PM | #10 |
Guru
Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
Yes but then there's lots of possibilities
Mr. Mrs. Dr. Prof. Ms. Miss. Messrs. ..... and probably lots more There's also other names, with initials, P. Smith wouldn't get found either. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regex Question | Archon | Conversion | 11 | 02-05-2011 10:13 AM |
What a regex is | Worldwalker | Calibre | 20 | 05-10-2010 05:51 AM |
Help with a regex | A.T.E. | Calibre | 1 | 04-05-2010 07:50 AM |
Import files, regex question | al35 | Calibre | 0 | 03-22-2010 12:33 PM |
Regex help... | Bobthebass | Workshop | 6 | 04-26-2009 03:54 PM |