![]() |
#1 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Suggestion: Spell Check Tool Enhancement
This topic spurred a thought in my brain for an enhancement to the Spell Check Tool:
https://www.mobileread.com/forums/sho...d.php?t=241283 I made a post (#6): https://www.mobileread.com/forums/sho...51&postcount=6 Long story short, it would be a nice feature enhancement to have a checkbox to allow a Case Sensitive SEARCH. For example, searching for a capital letter 'O' will allow "Octopus" to be shown in the list, but not "octopus". Searching for a capital 'I' will show "McIver", but not "Mciver". Having the checkbox off will be the default (current) implementation (not case sensitive). As I mentioned in that topic, I believe a Case Sensitive Search would be extremely helpful for catching many of these hard to find OCR errors (capital 'I' instead of lowercase 'l', etc. etc....) |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,216
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Fantastic work Kovid, thank you for implementing this tweak so quickly.
Just used it to catch a bunch of capital 'I' -> 'l', which are a common OCR error: couId concIusions falI faiI goodwiII piIgramage uItimate welI [...] EXTREMELY helpful addition. |
![]() |
![]() |
![]() |
#4 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Hmmm... so today I was fiddling around some more with the Calibre Spell Check tool, and I stumbled across this problem.
The hyphen '-' should be considered a legitimate character for a word. Example of how it currently works: The word "non-fiction" is seen as two words, "non" and "fiction". The word "micro-economics" is seen as two words, "micro" and "economics". The word "anti-establishment" is seen as two words, "anti" and "establishment". A few reasons why this fix would be extremely useful: 1. I use this ALL THE TIME in Sigil in order to catch usages of non-hyphenated and hyphenated versions of words. It is QUITE a common OCR error, where you might have mixes of "nonfiction" + "non-fiction", "co-operating" + "cooperating", "counter-clockwise" + "counterclockwise", "short-term" + "shortterm" in the same book. These typically then have to be made consistent/normalized throughout the book. 2. It makes it quite helpful to catch accidental hyphens in author's first/last names. For example, "Black-well" -> "Blackwell", "How-den" -> "Howden", "Lach-mann" -> "Lachmann", "Lee-son" -> "Leeson". Last edited by Tex2002ans; 07-06-2014 at 06:59 AM. |
![]() |
![]() |
![]() |
#5 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,216
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
This is a limitation of ICU, it breaks words on hyphens, even though its documentation claims it shouldn't. It is on my TODO list to see if I can implement an efficient workaround.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 | |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,640
Karma: 29710510
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
![]() BR |
|
![]() |
![]() |
![]() |
#7 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
I am using Calibre's list to point out/narrow down the errors, and then just doing all my fixes in Sigil. Suggestion: Another odd thing I noticed in Calibre's Spell Check List is numbers. I believe that "words" that are completely made of numbers + periods + commas should not be included in the list at all. I believe the way that Sigil handles it, a "word" with ANY numbers is removed. But after seeing Calibre's list, I still think it is useful if "words" with SOME numbers are still left there. For example, these can then be caught/stand out like sore thumbs:
Seeing these in list form + the amount of times they occur in the book is extremely helpful for spotting inconsistencies. Perhaps you can safely remove "words" that are FULLY numbers, but still keep the ones that are SOME numbers? Perhaps it can be another toggle? Include numbers, not include numbers? (Or perhaps this would make the UI too cluttered?). Side Note: I am currently working on digitizing 12 years of a journal (~ 2 million words). The perfect size to put Calibre's Editor through some serious testing! Now, all we need is the fantastic Reports functionality to come over to Calibre's Editor. Last edited by Tex2002ans; 07-08-2014 at 07:41 PM. |
|
![]() |
![]() |
![]() |
#8 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
And I would like to have a way to ignore the words with numbers. An alternative to an option is to have the "Ignore" button work on multiple words. Select all the words you want to ignore and press the button once. At the moment it seems to work only on the last word selected. |
|
![]() |
![]() |
![]() |
#9 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,640
Karma: 29710510
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Talking of numbers. If a book has an index, all the page number links are flagged as errors - see attachment. If I ignore all those 'numbers' I get to watch an hourglass for anything up to 10 minutes. Also if a book has a long list of references many of the author names will be flagged as errors.
IMO an index or reference list should be in separate files, and they normally are. So, if one could exclude files from spell checking then one could deal with the index and reference files separately. Be nice to have the ability to exclude paragraphs too - to avoid checking quotes in the original vernacular - eg Chaucer, Shakespeare etc ![]() BR Last edited by BetterRed; 07-08-2014 at 10:11 PM. |
![]() |
![]() |
![]() |
#10 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
![]() (Typically names spelled wrong, missing accents in names/words, etc. etc.) Maybe something along the lines of Sigil's "sigil_not_in_toc", maybe you could mark that p or blockquote with a class like "calibre_ignore_spellcheck". |
|
![]() |
![]() |
![]() |
#11 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,216
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@davidfor: You can ignore multiple words by selecting them and right clicking. The buttons only operate on a single word at a time.
|
![]() |
![]() |
![]() |
#12 | ||
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,640
Karma: 29710510
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
And then deal with the others in a separate pass(es) and maybe exclude the body of the book. My thinking is that the file exclusions would not persist between sessions. Quote:
Currently I ignore 'misspellings' in Shakespeare et al quotes; but t'would be most felicitous to do otherwise ![]() But I repeat, for me it's a nice to have. BR |
||
![]() |
![]() |
![]() |
#13 | |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
![]() |
|
![]() |
![]() |
![]() |
#14 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
While we are discussing minor tweaks to Spellcheck:
Suggestion: Possible to ignore spellcheck of text between links: Quote:
|
|
![]() |
![]() |
![]() |
#15 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Spell Check Suggestion | Tex2002ans | Sigil | 19 | 01-10-2013 08:45 PM |
Spell Check | GeckoFriend | Sigil | 5 | 06-15-2012 03:09 PM |
how to use spell check | richreads | Sigil | 2 | 01-24-2012 10:13 PM |
Disable spell check? | mariel9898 | Nook Developer's Corner | 0 | 03-26-2011 09:49 AM |
Enhancement suggestion. | moggie | Calibre | 1 | 01-01-2009 01:35 PM |