Quote:
Originally Posted by caseym54
Is there a way to use regex inside the spellcheck search?
Example: Often misspelled words than erroneously end in "l" are supposed to have exclamation points.
|
The way that I currently handle this is in multiple passes.
In
Tools > Spellcheck > Spellcheck (Alt+Q):
1. I type in lowercase 'l' (or whatever letter I'm looking for).
Then I toggle the "Show All Words" checkbox.
Very likely those "l instead of !" words will appear in the "misspelled words" spellcheck list.
- - -
Side Note: In Calibre's Spellcheck List, there's also a "Case sensitive search" checkbox.
Extremely helpful in this case, because you don't want capital 'L' words clogging up your list.
- - -
2. After I correct, I toggle the checkbox again (so all "correctly spelled words"), then scroll through and see if I can spot any oddities.
3. Then one more pass at the "misspelled" list.
Note: I do a similar passes with 'l' -> '1' or 'o' -> '0' OCR errors like:
Code:
l98l
198o
196os
h0wever
That's one of the reasons why I requested
Spellcheck Lists to support numbers back in 2017.
Calibre has always supported numbers. In Sigil, you need to enable it in
Edit > Preferences > Spellcheck Dictionaries > "Check Numbers" (tiny checkbox in the very upper right corner).
Side Note #2: You can also use a similar trick to catch accidental/inconsistent hyphens.
I wrote about it in 2013!
Quote:
Originally Posted by caseym54
Using the normal search has too many false positives, but after filtering for spelling, it would speed up things. Full regex find/replace would be nice within the spellcheck, too.
Is there any plugin that does this?
|
No, but I
did recently discuss/brainstorm an
"Advanced Find/Replace" concept about a month ago in
a random Calibre topic.
So you'd have a Spellcheck List-type menu with:
and 3 sortable columns:
You'd be able to selectively apply Find/Replace ONLY on specific rows.
(Currently, you can only Find/Replace one-by-one OR Replace All... Similar to the slowness of
spellchecking/grammarchecking documents one-by-one vs. mass checking in list form!)
The Technical Details
Here's the relevant posts from that thread:
Spoiler:
Quote:
Originally Posted by Tex2002ans
2. A tool like Bulk Rename Utility allows you to mass search/replace filenames:
Attachment 189406
You fill out your parameters below.
Then you select which files you want to apply it to (Ctrl+Click/Shift+Click).
It puts green highlight on the files that'll actually change, and shows you the before/after in 2 columns.
|
Quote:
Originally Posted by Tex2002ans
I also believe this would be helpful in the normal large Find/Replaces (with a handful of edge cases).
Like this thread. A giant Find/Replace to switch all "123" -> "spelled-out numbers" form.
100 replaces were fine:
- Chapter 21 -> Chapter Twenty-One
- I was 2 years old -> I was two years old
[...]
A Sortable/Searchable (List-Based?) Differ (Advanced Find/Replace?)
When the amount of changes are overwhelming (in the hundreds/thousands).
Similar to the Spellcheck List, you'd be able to type in a:
- Find
- Replace
Run this on a book (like pressing "Count All") and generate a list:
- Find: Chapter \d+
You'd get a list of all hits:
Code:
Found | Replace | Hits
Chapter 1 | | 1
Chapter 2 | | 1
Chapter 3 | | 1
Chapter 4 | | 1
[...]
Chapter 100 | | 1
You'd be able to double-click on any entry and jump to its location.
And, similar to the Spellcheck List, you can search/sort through this:
- Search: 1
Code:
Found | Replace | Hits
Chapter 1 | | 1
Chapter 10 | | 1
Chapter 11 | | 1
Chapter 12 | | 1
[...]
Chapter 100 | | 1
- Search: 10
Code:
Found | Replace | Hits
Chapter 10 | | 1
Chapter 100 | | 1
You'd also be able to do a Replace:
- Find: Chapter (\d+)
- Replace: Chap. \1
Code:
Found | Replace | Hits
Chapter 1 | Chap. 1 | 1
Chapter 2 | Chap. 2 | 1
Chapter 3 | Chap. 3 | 1
Chapter 4 | Chap. 4 | 1
[...]
Chapter 100 | Chap. 100 | 1
Here, I can also scroll through the list and accept/reject certain replaces.
Maybe, sorting by Hits, there would be a:
Code:
Chapter 5 | Chap. 5 | 5
so you scratch your head, take a closer look, and maybe the book has a few:
- See Chapter 5 for more information.
You may want to treat that differently than:
so you'd apply the change to all 99 other replaces first, then you can dig in to that oddity in more detail.
|
And then over the following days, I discussed even better use-cases + concepts via PMs:
PM #1
You see that "Chapter (\d+)" example I gave?
Anyway, when I woke up, I thought of few other cases where I'd find that type of workflow extremely useful.
One search/replace where you do thousands, but want to deny a few exceptions, is EN DASHES:
* * *
Current Method
What I typically do is this:
Search: (\d+)-(\d+)
Replace: \1\2
but then I have to be very careful with URLs, ISBNs, etc.
So what I currently do is split it into separate, smaller steps.
- Anything with a "pp." or "p." before it? Replace All.
- Open up the Index? "Current File" -> Replace All.
- Then I go through step-by-step and have to manually do the rest.
- (Or I "hack" the Spellcheck List with numbers, then search for a hyphen to see what I'm looking at. :P)
- If I catch any oddities at that step, I make sure to NOT Replace All, and may tackle chapters one-at-a-time with "Current File".
* * *
Sortable Advanced Find & Replace List
I could do something like:
Search: \b(p+)\.* (\d+)-(\d+)
Replace: pp. \2\3
Running that, you'd get a giant sortable list of:
Code:
pp. 123-125
pp. 125-127
p 123-125
p. 125-127
pp 130-135
pp. 123-5
pp 125-7
so, at a glance, I can see already see errors in the book (missing periods + some not in 3-digit form).
Then I'd be able to do multiple passes:
Filter: \.
Code:
pp. 123-125
pp. 125-127
p. 125-127 <--- Single "p." error
pp. 123-5 <--- Inconsistency
Great. Replace the first 3. Then double-click on the "pp. 123-5" and/or manually correct to "pp. 123125".
Blank the Filter, and now I'm left with:
Code:
p 123-125
pp 130-135
pp 125-7 <--- Inconsistency
Again, the first 2 can be replaced, but the 3rd one needs the 3-digit form.
Being able to see some Advanced "Count All", at a glance, in a sortable list... I think this would be some ultimate power move. :P
(Although yes, yes,... what if someone puts in some insane Regex that grabs entire paragraphs... how would that get shoved/displayed in the lists... lol.)
Anyway, I don't currently know of any tool that does this. As I explained in those Calibre posts, I see bits and pieces here and there, but nothing that displays them in easy-to-read lists like the Spellcheck Lists!
* * *
Side Note:
Non-Linear Editing
Another fantastic thing I've been doing lately is editing using Regex.
1. A common thing in Fiction is "creative dialogue tags".
Instead of saying "said", authors may write things like:
- opined
- accused
- agreed
- beseeched
So what I've been doing is similar to this Regex:
Search: , \b(Alex|Bob|Joanne|Suzie|s*he|they)\b (\w+)
Replace: , \1 said
Code:
Found | Replace | Hits
, Alex opined | , Alex said | 10
, Suzie accused | , Suzie said | 9
, Joanne agreed | , Joanne said | 4
, she beseeched | , she said | 1
2. Or Normalizing "said he" -> "he said"
Search: , (said) \b(Alex|Bob|Joanne|Suzie|s*he|they)\b
Replace: , \2 \1
Being able to run a Regex like that across an entire book, see a generated list of all usages... it would be GLORIOUS.
3. Or something like:
Search: ([!\?]) ([A-Z])(\w+) (\w+)
Replace: \1 \L\2\3 \4
to catch accidentally capitalized letters after '!' or '?' when they should be lowercase! Example:
- ✗ What did you say? He asked.
- ✓ What did you say? he asked.
- ✗ Time to die! He yelled.
- ✓ Time to die! he yelled.
- Attack! Fight for your life! Alex jumped onto the ship, swinging his sword.
- The vast majority fall into this "don't change" category.
* * * * * * * * *
PM #3
[...]
Or something similar to that image I showed in Bulk Rename Utility.
You'd have Before/After columns.
Out of those rows, you select which ones you want to apply to. It highlights those rows different, so then you can see what the heck it'll actually change it to.
If you're satisfied, then you press the button and it mass replaces those.
So let's say I run something like:
Search: (\w+)</p>\s+<p>([a-z])
Replace: \1 \2
you'd get a giant list of:
Code:
Before | After
_________________________________|___________________
And how</p> | And how are you?
<p>are you? |
|
And another</p> | And another one is here.</p>
<p>one is here.</p> |
Heh, but kind of like I mentioned in that PM to KevinH... no idea how to display list-forms when the person shoves in huge regex (like capturing entire HTML files or enormous paragraphs).