Okay, so I woke up today... and after giving it some thought, I needed to put these numbers into context.
I grabbed a 1.9 million word journal I've worked on and ran it through Spellcheck Lists.
There were 4 categories:
- Yes/No Periods
- Sigil 1.9.10 default vs. Pre-1.9.10.
- Yes/No Numbers
- "Check Numbers" option on/off.
Here are the results:
Here's the chart of "sentence-enders" vs. "acronyms":
Here's the raw data:
Code:
Total Words = 1921208
Category Unique Differ. % Drop from Prev.
Periods 75160
No Periods 61666 13494 17.95%
Periods+No Nums 66399
No Periods+No Nums 54334 7332 11.04%
# of "Sentence-End" 13494
# of "Acronyms" 188
The way I see it:
- Yes, Sigil 1.9.10 made spellchecking of ~200 acronyms (0.2%) more accurate...
- but created ~11–18% more "false positives".
These false positives:
- Create "visual clutter"
- + exacerbate all the problems I mention in the previous posts, making every step of the "proofing" chain slower + less effective.
- - - -
Info (Acronyms)
I considered "acronyms" as all inter-word periods:
- Common Phrases
- i.e. + e.g.
- a.m. + p.m.
- A.D. + B.C.
- Ph.D.
- First + Middle Initial
- F.A. (Hayek)
- W.E.B. (Du Bois)
- Acronyms
- F.B.I.
- C.I.A.
- U.S.A.
- U.S.S.R.
- U.C.L.A.
- States
Flaws in Counting Method
I did not include URLs (this journal didn't have any) or many of the categories I listed in:
I considered "sentence-ending" to be "only letters + 1 period at end":
While this included the valid:
- Mr. / Mrs. / Dr.
- St. (Saint / Street)
this is just a
tiny fraction—maybe a few dozen—the vast majority are "duplicate word + period"s.
- - -
Side Note: Quick Acronyms
In Sigil's Spellcheck Lists, searching for '.' instantly listed nearly all acronyms.
This is "Show All Words" Checked/Unchecked:
Pre-1.9.10:
vs. Sigil 1.9.10:
As you can see, in Sigil 1.9.10—no matter if "Show All" is on/off—it's still flooded with
multiple thousands of extras:
- Checked
- = Every acronym
- + every "word + period"
- Unchecked
- = Every acronym
- + nearly every "ALL CAPS + period"
- + every "misspelled + period".
- Including nearly everyone's last names, like "Clayton."!
Pre-Sigil 1.9.10:
- This was a split second skimming.
- The list was almost pure true acronyms.
Sigil 1.9.10 default:
- ~200 true acronyms are buried under thousands of sentence-enders.
- Marginally better when toggling "Show All" ON/OFF.
- - -
Acronym Differences (Sigil 1.9.10 vs. Pre-Change)
When I compared between:
I got 14/188 different acronyms:
Sigil 1.9.10 shifted these from misspelled -> correct.
The rest were all the same pre- + post-change.
Acronym Recommendations (Sigil 1.9.10 vs. Pre-change)
Yes, here, I agree, Sigil 1.9.10 handles the acronyms much better:
Code:
Original 1.9.10 Pre
A.C.L.U. A.C.L.U. ACOLYTE
A.F.L. A.F. AWFUL
C.I.A. C.I.A. ACACIA
Y.W.C.A. Y.W.C.A. ACADEMY
F.B.I. B.F.A. FABIAN
F.B. FIB FIB
U.S.A. U.S.A. USAGE
U.S.S.R. U.S.S.R. SAUSSURE
E.g. Eg Eng
Ph.D. Ph. D. Ph. D.
but, again, at what cost?
- ~0.2% of cases getting more accurate recommendations.
- And acronyms are hard to even find in the List now!
- vs. ~10–20% guaranteed "visual clutter".
- In all use-cases of Spellcheck Lists.
- - -
Thought: Hmmmm.... just spitballing ideas out there.
Perhaps something could be done like:
- If period at end + all letters are capital
- Consider '.' part of word + use better recommendations.
- If period at end + any letters are lowercase
- Trim '.' off end + act the old way.
This would still not be good for things like "Ph.D.", but I believe the
vast majority of these true acronyms are of the:
ALL CAPS-type:
- F.B.I.
- C.I.A.
- A.D. / B.C.
This would then remove duplicates like:
- word.
- Clayton.
- Rothbard.
- Jumbled.
and lower the cluttering by a ton (plus keeping accurate word counts for all non-acronym words!).
- - -
Thought #2: I still think a toggle for "Check Periods" would be great.
Again, I can see
some usage for this.
(It actually helped me catch a few typos where I missed the closing period on a "U.S.S.R"!)
But, just like the Numbers, it creates MANY more "false positives".
Allowing it to be toggled ON/OFF would allow advanced users to use it, if needed.
As you can see in the stats above:
- Periods On adds ~10–20% clutter.
- Numbers On adds ~12%+ clutter.
- - - -
Quote:
Originally Posted by KevinH
And if you want to make accurate counts, I recommend using the new Saved Search Group Counts Report feature and not trying to use SpellCheck for that. It was added for just that purpose.
|
This is
madness!
Why are Spellcheck Lists great?
Because they list all
unique words (1-grams) and display them in such a compact form!
To see how/why n-grams are so powerful, see my recent posts in:
Again, I've written about all this stuff since Spellcheck Lists were first introduced back in 2013 (Sigil 0.7.0) based on my recommendation!
You already
had near-perfection for all these years. And then you:
And now, 2022, all Spellcheck Lists needed was a little tweak along the edge (acronyms)!
But this new way... no. In my mind, it's 1 micro-step forward, 2 giant leaps backward!
- - -
Come on, KevinH (and Diap)...
Listen to your bestest buddy Tex. When have I ever lead you wrong in all these years?