08-15-2021, 07:21 AM | #1 |
Connoisseur
Posts: 57
Karma: 10
Join Date: Jul 2021
Device: Abakus
|
Spellcheck filter - upper/lower case
The filter in the spellcheck does nor distinguish between upper and lower case, searching for U shows all words with u as well.
Bug or feature? Binchen |
08-15-2021, 07:38 AM | #2 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Not a bug. It works as designed. The code, just like other filters used in Sigil, is purposely made case insensitive via a lower casing.
Code:
void SpellcheckEditor::FilterEditTextChangedSlot(const QString &text) { const QString lowercaseText = text.toLower(); QModelIndex root_index = m_SpellcheckEditorModel->indexFromItem(m_SpellcheckEditorModel->invisibleRootItem()); for (int row = 0; row < m_SpellcheckEditorModel->invisibleRootItem()->rowCount(); row++) { QStandardItem *item = m_SpellcheckEditorModel->item(row, 0); bool hidden = !(text.isEmpty() || item->text().toLower().contains(lowercaseText)); ui.SpellcheckEditorTree->setRowHidden(item->row(), root_index, hidden); } } Last edited by KevinH; 08-15-2021 at 07:46 AM. |
08-15-2021, 08:44 AM | #3 |
A Hairy Wizard
Posts: 3,094
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
In the find/replace section there is an option for Case Sensitive searches. Is it possible to have a checkbox to turn that on or off in the spellcheck function?
I don't know of many words in English that would be mis-spelled with a capital that wouldn't also be mis-spelled as lower case, but I am certainly not fluent in all the languages available to Sigil. Also, as a work-around, the user could certainly use the find/replace function to find words with an aberrant caPital. |
08-15-2021, 09:34 AM | #4 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Actually all spellchecking itself can be case sensitive since many languages like German capitalize nouns (not just proper nouns) and they are incorrect if not properly capitalized. The dictionary used determines if capitalization matters, not Sigil.
|
08-15-2021, 10:03 AM | #5 |
A Hairy Wizard
Posts: 3,094
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Here's a basic regex that can check for capitals inside a word....although the DOCTYPE statement gives a bunch of matches...
find: (?<=\w)[A-Z] |
10-23-2021, 04:07 PM | #6 |
Member
Posts: 16
Karma: 10
Join Date: Jan 2014
Location: ABQ, NM, USA
Device: Kindle Paperwhite 10G
|
Is there a way to use regex inside the spellcheck search?
Example: Often misspelled words than erroneously end in "l" are supposed to have exclamation points. Using the normal search has too many false positives, but after filtering for spelling, it would speed up things. Full regex find/replace would be nice within the spellcheck, too. Is there any plugin that does this? |
10-23-2021, 05:23 PM | #7 | ||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
In Tools > Spellcheck > Spellcheck (Alt+Q): 1. I type in lowercase 'l' (or whatever letter I'm looking for). Then I toggle the "Show All Words" checkbox. Very likely those "l instead of !" words will appear in the "misspelled words" spellcheck list. - - - Side Note: In Calibre's Spellcheck List, there's also a "Case sensitive search" checkbox. Extremely helpful in this case, because you don't want capital 'L' words clogging up your list. - - - 2. After I correct, I toggle the checkbox again (so all "correctly spelled words"), then scroll through and see if I can spot any oddities. 3. Then one more pass at the "misspelled" list. Note: I do a similar passes with 'l' -> '1' or 'o' -> '0' OCR errors like: Code:
l98l 198o 196os h0wever Calibre has always supported numbers. In Sigil, you need to enable it in Edit > Preferences > Spellcheck Dictionaries > "Check Numbers" (tiny checkbox in the very upper right corner). Side Note #2: You can also use a similar trick to catch accidental/inconsistent hyphens. I wrote about it in 2013! Quote:
So you'd have a Spellcheck List-type menu with:
and 3 sortable columns:
You'd be able to selectively apply Find/Replace ONLY on specific rows. (Currently, you can only Find/Replace one-by-one OR Replace All... Similar to the slowness of spellchecking/grammarchecking documents one-by-one vs. mass checking in list form!) The Technical Details Here's the relevant posts from that thread: Spoiler:
And then over the following days, I discussed even better use-cases + concepts via PMs: PM #1 You see that "Chapter (\d+)" example I gave? Anyway, when I woke up, I thought of few other cases where I'd find that type of workflow extremely useful. One search/replace where you do thousands, but want to deny a few exceptions, is EN DASHES: * * * Current Method What I typically do is this: Search: (\d+)-(\d+) Replace: \1–\2 but then I have to be very careful with URLs, ISBNs, etc. So what I currently do is split it into separate, smaller steps.
* * * Sortable Advanced Find & Replace List I could do something like: Search: \b(p+)\.* (\d+)-(\d+) Replace: pp. \2–\3 Running that, you'd get a giant sortable list of: Code:
pp. 123-125 pp. 125-127 p 123-125 p. 125-127 pp 130-135 pp. 123-5 pp 125-7 Then I'd be able to do multiple passes: Filter: \. Code:
pp. 123-125 pp. 125-127 p. 125-127 <--- Single "p." error pp. 123-5 <--- Inconsistency Blank the Filter, and now I'm left with: Code:
p 123-125 pp 130-135 pp 125-7 <--- Inconsistency Being able to see some Advanced "Count All", at a glance, in a sortable list... I think this would be some ultimate power move. :P (Although yes, yes,... what if someone puts in some insane Regex that grabs entire paragraphs... how would that get shoved/displayed in the lists... lol.) Anyway, I don't currently know of any tool that does this. As I explained in those Calibre posts, I see bits and pieces here and there, but nothing that displays them in easy-to-read lists like the Spellcheck Lists! * * * Side Note: Non-Linear Editing Another fantastic thing I've been doing lately is editing using Regex. 1. A common thing in Fiction is "creative dialogue tags". Instead of saying "said", authors may write things like:
So what I've been doing is similar to this Regex: Search: ,” \b(Alex|Bob|Joanne|Suzie|s*he|they)\b (\w+) Replace: ,” \1 said Code:
Found | Replace | Hits ,” Alex opined | ,” Alex said | 10 ,” Suzie accused | ,” Suzie said | 9 ,” Joanne agreed | ,” Joanne said | 4 ,” she beseeched | ,” she said | 1 Search: ,” (said) \b(Alex|Bob|Joanne|Suzie|s*he|they)\b Replace: ,” \2 \1 Being able to run a Regex like that across an entire book, see a generated list of all usages... it would be GLORIOUS. 3. Or something like: Search: ([!\?]”) ([A-Z])(\w+) (\w+) Replace: \1 \L\2\3 \4 to catch accidentally capitalized letters after '!' or '?' when they should be lowercase! Example:
* * * * * * * * * PM #3 [...] Or something similar to that image I showed in Bulk Rename Utility. You'd have Before/After columns. Out of those rows, you select which ones you want to apply to. It highlights those rows different, so then you can see what the heck it'll actually change it to. If you're satisfied, then you press the button and it mass replaces those. So let's say I run something like: Search: (\w+)</p>\s+<p>([a-z]) Replace: \1 \2 you'd get a giant list of: Code:
Before | After _________________________________|___________________ And how</p> | And how are you? <p>are you? | | And another</p> | And another one is here.</p> <p>one is here.</p> | Last edited by Tex2002ans; 10-23-2021 at 08:53 PM. |
||
10-29-2021, 03:26 AM | #8 |
Member
Posts: 16
Karma: 10
Join Date: Jan 2014
Location: ABQ, NM, USA
Device: Kindle Paperwhite 10G
|
Thanks for that very helpful reply.
Eventually I got around to this (and some variations) (?!tm)([a-z][a-z])l\s or maybe a quote. Problem with"html in headers" \1\2 |
10-29-2021, 09:49 PM | #9 | |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
so this regex:
(For more info on \b, see Regular-Expressions.info: "Word Boundaries".) Anyway, to tackle the "l exclamation point" error, I would probably handle it this way: Finding Lowercase L Words In Calibre: Method A. Tools > Check Spelling. You can use whatever search criteria you need. ("Show only misspelled words", etc.) Then you can highlight all the words (Ctrl+A) + Right-Click > "Copy Selected Words to Clipboard": Method B. Tools > Reports > Words. Press the "Save" button in the bottom right. Then you can save a CSV file: From there, you can export to another program (like Notepad++ or LibreOffice Calc), where you can run regex or do more analysis. Side Note: I believe Sigil will be getting more CSV/export functionality in the future. * * * I ran Method A on a 130k word book:
Code:
Bobbs-Merrill Bucknell Jouvenel Kozol Kristol Mandel Passell Samual Shaull Stargell Wittfogel Wohl al calculational eft-liberal marshall nonexponential nonideological nonrenewal ntil pre-Civil preindustrial proindustrial quotal warall In an instant, you can tell most of these are just people's names. Then you can see:
This method should catch most of that "l exclamation point" error. - - - - - Side Note: Finding Words Ending With Lowercase L After getting the list of words out of Calibre... This is the regex I use in Notepad++: Search: ^(.+)(l)$ Replace: #\1\2 In English, this searches for:
replace with a '#' at the beginning of that word: pre-Civil -> #pre-Civil calculational -> #calculational Then I sort alphabetically, and poof, all "words with a #" appear up top. - - - - - Usage Note: When I ran Method A on "all words":
Here's a piece: Spoiler:
Still reasonable to look through, but you can see how you'd have to have the perfect storm of: 1. A word that is correctly spelled without an 'l'. 2. The 'l' -> exclamation point error occurring. 3. The word also correctly spelled with an extra 'l'. You can see how rare it would be to land in that category. Three such examples would be:
Grammarchecker From there, you may want to run the text through a grammarchecker... This may be able to catch:
Example:
Also a good idea if working in Fiction (or heck, even Non-Fiction). Very likely the "l exclamation point" error will occur before the close quote, so you'd: Search: l” Replace: !” That would catch things like:
Anyway, those methods would get you 99%+ of the way there, very quickly, without having to check ALL thousands of hits one-by-one-by-one. Last edited by Tex2002ans; 10-29-2021 at 10:25 PM. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Upper to Lower Case Regex - I'm stuck! | Chris_Snow | Sigil | 20 | 11-03-2023 03:54 PM |
Replace UPPER with lower? | vr8ce | Editor | 4 | 07-06-2018 09:43 PM |
Author in Upper Case, Author Sort in Lower Case? | JohnnyBook | Calibre | 5 | 09-18-2015 09:45 PM |
upper case to sentence case conversion | cybmole | Sigil | 8 | 01-20-2011 06:03 AM |
Buy Sony PRS-505 Ornamental Plates both lower and upper | pnyc | Flea Market | 2 | 05-24-2009 11:17 AM |