09-07-2014, 02:04 PM | #1 |
Member
Posts: 17
Karma: 10
Join Date: Aug 2014
Device: Kindle Fire HDX
|
"document-aware" regex
There is a feature that I would love to see added to the regular expressions in Calibre's book editor. It's something I would work on myself if I could find the time, but that doesn't seem to be happening so I thought I'd throw the idea out there to see what people think.
Basically, the idea is to tie the regular expression parsing together with the spelling checker, so that the regex stuff can determine if things are spelled correctly or not. Here's the use case that prompts this idea. A lot of times when I'm editing an ebook, there are problems regarding hyphenation having been added directly into the text, so you get things like "enter-ing" or "start-ing". Clearly, with regex, finding a hyphenated word like that isn't a problem. The problem is knowing when it's a good idea to simply remove the hyphen and knowing when the hyphen needs to stay. For example, you wouldn't remove it from "business-like". So, what if there was a replacement marker that could remove hyphens and then look to see if the resulting replacement was marked as "OK" by the spelling checker? For example, you'd use "\-1" instead of "\1" (This is just an example and I'll admit I have no idea if it steps on anything in the existing regex implementation.) If you searched for "\s(.*?-.*?)\s" and it found "enter-ing". If the replacement marker was "\-1" then it would remove the hyphens from the "\1" result to get "entering" and if that passed the spelling check, it would use it as the replacement. If it did not pass the spelling check, then it would use the regular replacement text. There's undoubtedly a lot of potential for additional functionality once you tap into the spelling checker, but just being able to deal with the hyphens would be a great place to start. |
09-07-2014, 03:24 PM | #2 |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Shouldn't this be handled by the spellcheck itself?
In any event, Kovid is working on a macro facility for the editor. That should allow invoking regex and running spellcheck on the results. |
Advert | |
|
09-07-2014, 05:05 PM | #3 |
null operator (he/him)
Posts: 20,550
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@mikefulton - have you tried this, select All Words in the Spellchecker and filter with a '-', this was a feature added after the Spellcheckers initial release (filter with '-')
Kovid also added a 'smart' that tests if a hyphen-ated misspelt word can be corrected by removing the hyphen, and if so to bump that word to the top of the correction list. I don't think there's anything you need to do have that. I found with those two enhancements my 'beef' with hyphens were by and large overcome. BR |
09-07-2014, 11:02 PM | #4 |
creator of calibre
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
See my posts in this forum on the function mode for search and replace.
|
09-10-2014, 10:56 PM | #5 | |
Member
Posts: 17
Karma: 10
Join Date: Aug 2014
Device: Kindle Fire HDX
|
Quote:
|
|
Advert | |
|
09-10-2014, 11:11 PM | #6 | |
null operator (he/him)
Posts: 20,550
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
BR |
|
09-18-2014, 07:53 PM | #7 | |
Member
Posts: 17
Karma: 10
Join Date: Aug 2014
Device: Kindle Fire HDX
|
Quote:
The main issue is that you have to manually select the replacement for every unique occurrence. |
|
09-19-2014, 04:46 AM | #8 | |
null operator (he/him)
Posts: 20,550
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
Re the list of hyphenated words, I use it to spot erroneous hyphenations - eg bank-ruptures is not a misspelling but its probably wrong. I paste erroneous hyphenations into my clipboard. Once I've scanned the hyphenated word list I deal with the erroneous sublist I created in my clipboard with regular S&R. BR |
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Remove "Personal Document" tag WITHOUT losing annotations? | koudis226 | Amazon Kindle | 9 | 05-14-2014 05:38 PM |
regex "?" with higher priority greedynes then the previous "?" ? | user743 | Editor | 1 | 05-04-2014 03:57 AM |
[Old Thread] Regex "FN LN" to "LN, FN" & reverse? | unboggling | Library Management | 19 | 11-20-2013 06:44 AM |
Content Disabling "Personal Document Charge exceeded" warning? | runningwithbulls | Amazon Kindle | 10 | 10-21-2010 10:03 AM |
Brother Japan to present an A5 "Electronic Document Reader" | igorsk | News | 20 | 09-05-2008 08:20 AM |