![]() |
#1 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,068
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
Spell check question
In the sample text
Quote:
Same for i.e. and e.g. (just the i.e and e.g part) I assume it's because spell check assumes that the last period is the end of a sentence I was going to just add a.m to the dictionary, but got concerned that sometimes there would be an a.m that really was not correct Any suggestions or ideas? |
|
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 43,292
Karma: 21696336
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
yeah that's basically a tokenization issue, the splitting of text into words by ICU doesnt handle these. There's real solution I'm afraid, you just ignore them. Or use AM and PM and that is and for example instead of the abbreviations.
|
![]() |
![]() |
![]() |
#3 |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,969
Karma: 135224257
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
You are going to need to decide which one is more important to you. Personally, I tend to add a.m., p.m., etc. to the exceptions since I consider the chance of, for example, having a. followed by a stray m to be fairly low.
|
![]() |
![]() |
![]() |
#4 |
null operator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 20,015
Karma: 25139362
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
I accept the limitation, I've seen too many am.s and i.es to try second guessing.
For the UK you could use - "From 8:00am to 6:00pm Monday through Saturday" IIRC CMS allows closed small caps AM and PM. |
![]() |
![]() |
![]() |
#5 |
the rook, bossing Never.
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,521
Karma: 79379793
Join Date: Jun 2017
Location: Ireland
Device: Both Kinds: epub based makes and Kindle
|
Style guides variously have all of the possible options!
AM PM AM PM in small caps am pm A.M. P.M. A.M. P.M. in small caps a.m. p.m. AD CE BC BCE have less variations (small caps is common, the . usage rare and lower case very rare) The am and pm is pretty common in British English, usually with a space (perhaps a small space in print). Consistency is important. |
![]() |
![]() |
![]() |
#6 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,068
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
@Everyone
Thanks for the information. I had thought about a RegEx to replace 'a.m.' with just 'am' etc. but then i thought that if the 'a.m.' was at the end of a sentence, I'd only make things worse |
![]() |
![]() |
![]() |
#7 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 29,308
Karma: 53944634
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
There are cases, like you mention, that you just never use Replace all.
Search, <eyeball:yes> replace & find <eyeball:no> Search (a skip) Chances are that pattern will only appear a dozen or so times. A whole minute to do this way ![]() |
![]() |
![]() |
![]() |
#8 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,068
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
In my currently being edited book, chapter 1 had 20+ "a.m."s
![]() Some in middle of sentence: " at 9 a.m. they ..." Some at end of sentence: " at 9 a.m. Next they ..." Some at end of clause: " at 9 a.m., and then they ..." Replace All would really foul that up ![]() |
![]() |
![]() |
![]() |
#9 | ||
the rook, bossing Never.
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,521
Karma: 79379793
Join Date: Jun 2017
Location: Ireland
Device: Both Kinds: epub based makes and Kindle
|
You'd need to detect a.m.<space> <Capital> and a.m.<end of paragraph> at least.
Quote:
Quote:
I'd search and then manual replace as I'd not trust myself to to think of all of the combinations and then do a correct regex. I had an ebook with messed up chapter headings (up to 13, but there was no 5) and the final edit needed a search and manual edit. As well as crazy spans the title of the chapter was on the line above "Chapter <n>" which IMO is the wrong way round. Also no CSS at all, no system ToC and entire ebook was one file. I added a CSS file, replaced styles with classes and then let a Calibre convert to get file per chapter. It also used multiple spaces (deleted all and did indents etc with CSS) and multiple empty paragraphs for layout (added CSS to new class for headings). |
||
![]() |
![]() |
![]() |
#10 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,288
Karma: 12125705
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
They discussed lots of helpful regexes + edge-cases + tips on how to catch/fix/normalize these types of issues. - - - Side Note: And Kovid is right. It's a hard problem with no real solution. Just Right-Click > Ignore the red squigglies in these few cases + come up with a few regexes to check for the common edge-cases. Like: Search: \b[APap]\.[Mm], Search: \b[APap][Mm]\., which would check for a.m. + p.m. missing a period followed by a comma. If you make use of Saved Searches this can be as simple as a single run of a Group. ![]() - - - Side Note #2: If you want extreme details on "sentence-ending periods" and why you don't want to enable spellchecking periods at end of words... see my discussion in: Sigil 1.9.10+ made that change, and I was STRONGLY opposed to it. The amount of clutter and mess it introduced into the Spellcheck Lists was immense. Heavily outweighed by the handful of acronyms like "a.m." + "p.m." you'd have to check. In Post #21, I even showed graphs of "Sentence-Enders" vs. "Acronyms", where 0.2% hits were "corrected", but 99.8% hits were made much worse. Last edited by Tex2002ans; 11-20-2023 at 07:12 PM. |
|
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Issues with the Spell Check | Frenchdummy | Calibre | 4 | 02-14-2017 08:53 AM |
Spell Check question | phossler | Editor | 14 | 11-09-2016 09:55 PM |
Spell check question | MerlinMama | Editor | 4 | 07-24-2015 04:45 AM |
Spell Check | GeckoFriend | Sigil | 5 | 06-15-2012 04:09 PM |
how to use spell check | richreads | Sigil | 2 | 01-24-2012 11:13 PM |