01-07-2013, 08:14 AM | #1 |
Connoisseur
Posts: 55
Karma: 10
Join Date: Jan 2011
Device: 7" Tablet - Aldiko Reader Premium
|
Missing Commas & Full Stops
Hi Everyone and Happy New Year,
I wonder if anyone can help? I have an E-book that I am proof-reading and I am finding that there appears to be a lot of missing commas and full stops before a closing quotation mark. e.g. ‘Sorry’ I said. - ‘There are ways and means’ Should be: ‘Sorry,’ I said. - ‘There are ways and means.’ Is there a regex formula that could identify/isolate these incidents? I have checked for previous threads but couldn't find anything. I have been doing it manually, but if I'm reading late at night I still miss quite a few. Any help/advice greatly appreciated. |
01-07-2013, 08:42 AM | #2 |
Guru
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
One regex which should highlight them is
Code:
([^.,!?])’(\s|<) It will however also highlight 'quoted' words or phrases, which you can just skip. Hope this helps |
01-07-2013, 08:54 AM | #3 | |
Connoisseur
Posts: 55
Karma: 10
Join Date: Jan 2011
Device: 7" Tablet - Aldiko Reader Premium
|
Quote:
It works perfectly - it will save me tons of time and eyestrain. And such a prompt reply - much appreciated. |
|
01-09-2013, 12:18 AM | #4 |
Obsessively Dedicated...
Posts: 3,200
Karma: 34977896
Join Date: May 2011
Location: JAPAN (US expatriate)
Device: Sony PRS-T2, ADE on PC
|
@Paxman53 --- Another thought --- are you comparing this to an original text and know for sure that the missing punctuation are commas and full stops?
I've worked with numerous OCR scans that dropped the EMDASH. From reading the text it looked like missing commas/full-stops, but the PDF of the original book revealed the missing emdashes. |
01-09-2013, 04:49 AM | #5 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
Or horizontal ellipsis, those are also sometimes missed by OCR.
|
01-09-2013, 12:53 PM | #6 | |
Connoisseur
Posts: 55
Karma: 10
Join Date: Jan 2011
Device: 7" Tablet - Aldiko Reader Premium
|
Quote:
Yes I am comparing to an original text and there are definite ommissions, but the Regex provided by Perkin has definitely got around the problem. Thanks for the input though. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Modify duration of pauses after dots/full stops in TTS of PocketBook Pro 912 | new_babylon | PocketBook | 7 | 10-14-2012 03:27 AM |
2 Monitors & Widescreen Full Page | Warlocks | Devices | 2 | 05-11-2012 01:00 AM |
PRS-650 Ancient Egyptian Hieroglyphs & Missing Pictures | Spellbinder | Sony Reader | 2 | 11-01-2010 09:29 AM |
Seriously thoughtful What about commas? | GraceKrispy | Lounge | 115 | 10-18-2010 10:19 PM |
Calibre tags & cover missing from Stanza | ChrisZA | Calibre | 4 | 03-13-2009 01:23 AM |