Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 01-07-2013, 08:14 AM   #1
Paxman53
Connoisseur
Paxman53 began at the beginning.
 
Posts: 55
Karma: 10
Join Date: Jan 2011
Device: 7" Tablet - Aldiko Reader Premium
Missing Commas & Full Stops

Hi Everyone and Happy New Year,

I wonder if anyone can help?

I have an E-book that I am proof-reading and I am finding that there appears to be a lot of missing commas and full stops before a closing quotation mark.

e.g. ‘Sorry’ I said. - ‘There are ways and means’

Should be: ‘Sorry,’ I said. - ‘There are ways and means.’

Is there a regex formula that could identify/isolate these incidents?

I have checked for previous threads but couldn't find anything.

I have been doing it manually, but if I'm reading late at night I still miss quite a few.

Any help/advice greatly appreciated.
Paxman53 is offline   Reply With Quote
Old 01-07-2013, 08:42 AM   #2
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
One regex which should highlight them is
Code:
([^.,!?])’(\s|<)
It searches for right single quote, followed by whitespace or the beginning of a further tag (usually end paragraph), but not if preceded by . , ! or ?

It will however also highlight 'quoted' words or phrases, which you can just skip.

Hope this helps
Perkin is offline   Reply With Quote
Advert
Old 01-07-2013, 08:54 AM   #3
Paxman53
Connoisseur
Paxman53 began at the beginning.
 
Posts: 55
Karma: 10
Join Date: Jan 2011
Device: 7" Tablet - Aldiko Reader Premium
Quote:
Originally Posted by Perkin View Post
One regex which should highlight them is
Code:
([^.,!?])’(\s|<)
It searches for right single quote, followed by whitespace or the beginning of a further tag (usually end paragraph), but not if preceded by . , ! or ?

It will however also highlight 'quoted' words or phrases, which you can just skip.

Hope this helps
Thank you Perkin,

It works perfectly - it will save me tons of time and eyestrain.

And such a prompt reply - much appreciated.
Paxman53 is offline   Reply With Quote
Old 01-09-2013, 12:18 AM   #4
GrannyGrump
Obsessively Dedicated...
GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.
 
GrannyGrump's Avatar
 
Posts: 3,200
Karma: 34977556
Join Date: May 2011
Location: JAPAN (US expatriate)
Device: Sony PRS-T2, ADE on PC
@Paxman53 --- Another thought --- are you comparing this to an original text and know for sure that the missing punctuation are commas and full stops?

I've worked with numerous OCR scans that dropped the EMDASH. From reading the text it looked like missing commas/full-stops, but the PDF of the original book revealed the missing emdashes.
GrannyGrump is offline   Reply With Quote
Old 01-09-2013, 04:49 AM   #5
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Or horizontal ellipsis, those are also sometimes missed by OCR.
Toxaris is offline   Reply With Quote
Advert
Old 01-09-2013, 12:53 PM   #6
Paxman53
Connoisseur
Paxman53 began at the beginning.
 
Posts: 55
Karma: 10
Join Date: Jan 2011
Device: 7" Tablet - Aldiko Reader Premium
Quote:
Originally Posted by grannyGrumpy View Post
@Paxman53 --- Another thought --- are you comparing this to an original text and know for sure that the missing punctuation are commas and full stops?

I've worked with numerous OCR scans that dropped the EMDASH. From reading the text it looked like missing commas/full-stops, but the PDF of the original book revealed the missing emdashes.
Sorry for the late reply, I am having major problems with Calibre at the moment.

Yes I am comparing to an original text and there are definite ommissions, but the Regex provided by Perkin has definitely got around the problem.

Thanks for the input though.
Paxman53 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Modify duration of pauses after dots/full stops in TTS of PocketBook Pro 912 new_babylon PocketBook 7 10-14-2012 03:27 AM
2 Monitors & Widescreen Full Page Warlocks Devices 2 05-11-2012 01:00 AM
PRS-650 Ancient Egyptian Hieroglyphs & Missing Pictures Spellbinder Sony Reader 2 11-01-2010 09:29 AM
Seriously thoughtful What about commas? GraceKrispy Lounge 115 10-18-2010 10:19 PM
Calibre tags & cover missing from Stanza ChrisZA Calibre 4 03-13-2009 01:23 AM


All times are GMT -4. The time now is 02:25 AM.


MobileRead.com is a privately owned, operated and funded community.