Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 11-17-2023, 08:39 PM   #1
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,071
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
Spell check question

In the sample text

Quote:
From 8:00 a.m. to 6:00 p.m. Monday through Saturday
a.m and p.m get flagged as mis-spelt

Same for i.e. and e.g. (just the i.e and e.g part)

I assume it's because spell check assumes that the last period is the end of a sentence

I was going to just add a.m to the dictionary, but got concerned that sometimes there would be an a.m that really was not correct

Any suggestions or ideas?
phossler is offline   Reply With Quote
Old 11-17-2023, 08:51 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,655
Karma: 22446730
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
yeah that's basically a tokenization issue, the splitting of text into words by ICU doesnt handle these. There's real solution I'm afraid, you just ignore them. Or use AM and PM and that is and for example instead of the abbreviations.
kovidgoyal is offline   Reply With Quote
Old 11-17-2023, 08:51 PM   #3
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 33,545
Karma: 142904165
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
You are going to need to decide which one is more important to you. Personally, I tend to add a.m., p.m., etc. to the exceptions since I consider the chance of, for example, having a. followed by a stray m to be fairly low.
DNSB is offline   Reply With Quote
Old 11-17-2023, 09:37 PM   #4
BetterRed
null operator
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,355
Karma: 25865246
Join Date: Mar 2012
Location: Sydney Australia
Device: none
I accept the limitation, I've seen too many am.s and i.es to try second guessing.

For the UK you could use - "From 8:00am to 6:00pm Monday through Saturday"

IIRC CMS allows closed small caps AM and PM.
BetterRed is offline   Reply With Quote
Old 11-18-2023, 10:12 AM   #5
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 10,400
Karma: 82723493
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
Style guides variously have all of the possible options!
AM PM
AM PM in small caps
am pm
A.M. P.M.
A.M. P.M. in small caps
a.m. p.m.

AD CE BC BCE have less variations (small caps is common, the . usage rare and lower case very rare)

The am and pm is pretty common in British English, usually with a space (perhaps a small space in print).

Consistency is important.
Quoth is offline   Reply With Quote
Old 11-18-2023, 11:31 AM   #6
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,071
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
@Everyone

Thanks for the information. I had thought about a RegEx to replace 'a.m.' with just 'am' etc. but then i thought that if the 'a.m.' was at the end of a sentence, I'd only make things worse
phossler is offline   Reply With Quote
Old 11-18-2023, 12:13 PM   #7
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,579
Karma: 54344444
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
There are cases, like you mention, that you just never use Replace all.

Search, <eyeball:yes> replace & find
<eyeball:no> Search (a skip)

Chances are that pattern will only appear a dozen or so times. A whole minute to do this way
theducks is offline   Reply With Quote
Old 11-19-2023, 10:38 AM   #8
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,071
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
In my currently being edited book, chapter 1 had 20+ "a.m."s

Some in middle of sentence: " at 9 a.m. they ..."
Some at end of sentence: " at 9 a.m. Next they ..."
Some at end of clause: " at 9 a.m., and then they ..."

Replace All would really foul that up
phossler is offline   Reply With Quote
Old 11-19-2023, 11:40 AM   #9
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 10,400
Karma: 82723493
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
You'd need to detect a.m.<space> <Capital> and a.m.<end of paragraph> at least.
Quote:
Some at end of clause: " at 9 a.m., and then they
That's
Quote:
9 am, and then they
It's same as a middle of a sentence.

I'd search and then manual replace as I'd not trust myself to to think of all of the combinations and then do a correct regex.

I had an ebook with messed up chapter headings (up to 13, but there was no 5) and the final edit needed a search and manual edit. As well as crazy spans the title of the chapter was on the line above "Chapter <n>" which IMO is the wrong way round. Also no CSS at all, no system ToC and entire ebook was one file. I added a CSS file, replaced styles with classes and then let a Calibre convert to get file per chapter. It also used multiple spaces (deleted all and did indents etc with CSS) and multiple empty paragraphs for layout (added CSS to new class for headings).
Quoth is offline   Reply With Quote
Old 11-20-2023, 06:44 PM   #10
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by phossler View Post
Thanks for the information. I had thought about a RegEx to replace 'a.m.' with just 'am' etc. but then i thought that if the 'a.m.' was at the end of a sentence, I'd only make things worse
All the variants of periods/no-periods + lowercase/uppercase AM/PM (and acronyms) were discussed in these topics:

They discussed lots of helpful regexes + edge-cases + tips on how to catch/fix/normalize these types of issues.

- - -

Side Note: And Kovid is right. It's a hard problem with no real solution.

Just Right-Click > Ignore the red squigglies in these few cases + come up with a few regexes to check for the common edge-cases. Like:

Search: \b[APap]\.[Mm],
Search: \b[APap][Mm]\.,

which would check for a.m. + p.m. missing a period followed by a comma.

If you make use of Saved Searches this can be as simple as a single run of a Group.

- - -

Side Note #2: If you want extreme details on "sentence-ending periods" and why you don't want to enable spellchecking periods at end of words... see my discussion in:

Sigil 1.9.10+ made that change, and I was STRONGLY opposed to it. The amount of clutter and mess it introduced into the Spellcheck Lists was immense. Heavily outweighed by the handful of acronyms like "a.m." + "p.m." you'd have to check.

In Post #21, I even showed graphs of "Sentence-Enders" vs. "Acronyms", where 0.2% hits were "corrected", but 99.8% hits were made much worse.

Last edited by Tex2002ans; 11-20-2023 at 07:12 PM.
Tex2002ans is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Issues with the Spell Check Frenchdummy Calibre 4 02-14-2017 08:53 AM
Spell Check question phossler Editor 14 11-09-2016 09:55 PM
Spell check question MerlinMama Editor 4 07-24-2015 04:45 AM
Spell Check GeckoFriend Sigil 5 06-15-2012 04:09 PM
how to use spell check richreads Sigil 2 01-24-2012 11:13 PM


All times are GMT -4. The time now is 08:33 AM.


MobileRead.com is a privately owned, operated and funded community.