Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 09-07-2014, 02:04 PM   #1
mikefulton
Member
mikefulton began at the beginning.
 
Posts: 17
Karma: 10
Join Date: Aug 2014
Device: Kindle Fire HDX
"document-aware" regex

There is a feature that I would love to see added to the regular expressions in Calibre's book editor. It's something I would work on myself if I could find the time, but that doesn't seem to be happening so I thought I'd throw the idea out there to see what people think.

Basically, the idea is to tie the regular expression parsing together with the spelling checker, so that the regex stuff can determine if things are spelled correctly or not.

Here's the use case that prompts this idea. A lot of times when I'm editing an ebook, there are problems regarding hyphenation having been added directly into the text, so you get things like "enter-ing" or "start-ing".

Clearly, with regex, finding a hyphenated word like that isn't a problem. The problem is knowing when it's a good idea to simply remove the hyphen and knowing when the hyphen needs to stay. For example, you wouldn't remove it from "business-like".

So, what if there was a replacement marker that could remove hyphens and then look to see if the resulting replacement was marked as "OK" by the spelling checker?

For example, you'd use "\-1" instead of "\1" (This is just an example and I'll admit I have no idea if it steps on anything in the existing regex implementation.)

If you searched for "\s(.*?-.*?)\s" and it found "enter-ing". If the replacement marker was "\-1" then it would remove the hyphens from the "\1" result to get "entering" and if that passed the spelling check, it would use it as the replacement. If it did not pass the spelling check, then it would use the regular replacement text.

There's undoubtedly a lot of potential for additional functionality once you tap into the spelling checker, but just being able to deal with the hyphens would be a great place to start.
mikefulton is offline   Reply With Quote
Old 09-07-2014, 03:24 PM   #2
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Shouldn't this be handled by the spellcheck itself?

In any event, Kovid is working on a macro facility for the editor. That should allow invoking regex and running spellcheck on the results.
eschwartz is offline   Reply With Quote
Advert
Old 09-07-2014, 05:05 PM   #3
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,550
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
@mikefulton - have you tried this, select All Words in the Spellchecker and filter with a '-', this was a feature added after the Spellcheckers initial release (filter with '-')

Kovid also added a 'smart' that tests if a hyphen-ated misspelt word can be corrected by removing the hyphen, and if so to bump that word to the top of the correction list. I don't think there's anything you need to do have that.

I found with those two enhancements my 'beef' with hyphens were by and large overcome.

BR
BetterRed is offline   Reply With Quote
Old 09-07-2014, 11:02 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
See my posts in this forum on the function mode for search and replace.
kovidgoyal is offline   Reply With Quote
Old 09-10-2014, 10:56 PM   #5
mikefulton
Member
mikefulton began at the beginning.
 
Posts: 17
Karma: 10
Join Date: Aug 2014
Device: Kindle Fire HDX
Quote:
Originally Posted by BetterRed View Post
@mikefulton - have you tried this, select All Words in the Spellchecker and filter with a '-', this was a feature added after the Spellcheckers initial release (filter with '-')

Kovid also added a 'smart' that tests if a hyphen-ated misspelt word can be corrected by removing the hyphen, and if so to bump that word to the top of the correction list. I don't think there's anything you need to do have that.

I found with those two enhancements my 'beef' with hyphens were by and large overcome.

BR
If this is something that's been added since 2.0, then no I probably haven't seen it. I wasn't happy with the spelling checker in previous versions and had found other options. I'll have to take another look at it.
mikefulton is offline   Reply With Quote
Advert
Old 09-10-2014, 11:11 PM   #6
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,550
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by mikefulton View Post
If this is something that's been added since 2.0, then no I probably haven't seen it. I wasn't happy with the spelling checker in previous versions and had found other options. I'll have to take another look at it.
@mikefulton - no the features I mentioned predate 2.0 by quite a while. I am not sure if they got a mention in the release notes. But I know they're there, because I was one of those extolling the virtues of having them

BR
BetterRed is offline   Reply With Quote
Old 09-18-2014, 07:53 PM   #7
mikefulton
Member
mikefulton began at the beginning.
 
Posts: 17
Karma: 10
Join Date: Aug 2014
Device: Kindle Fire HDX
Quote:
Originally Posted by BetterRed View Post
@mikefulton - have you tried this, select All Words in the Spellchecker and filter with a '-', this was a feature added after the Spellcheckers initial release (filter with '-')

Kovid also added a 'smart' that tests if a hyphen-ated misspelt word can be corrected by removing the hyphen, and if so to bump that word to the top of the correction list. I don't think there's anything you need to do have that.

I found with those two enhancements my 'beef' with hyphens were by and large overcome.

BR
OK, I've tried what you suggested, and while it works, it's horribly, horribly slow. Hardly any better than simply reading through and editing the text.

The main issue is that you have to manually select the replacement for every unique occurrence.
mikefulton is offline   Reply With Quote
Old 09-19-2014, 04:46 AM   #8
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,550
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by mikefulton View Post
OK, I've tried what you suggested, and while it works, it's horribly, horribly slow. Hardly any better than simply reading through and editing the text.

The main issue is that you have to manually select the replacement for every unique occurrence.
I suspect my hyphenation problems are not as severe as yours, when Kovid added the 'smart' to offer the de-hyphenated word as the first replacement choice (assuming it is not a misspelling) that fixed a huge number of errors in some 'books' I was working on at the time.

Re the list of hyphenated words, I use it to spot erroneous hyphenations - eg bank-ruptures is not a misspelling but its probably wrong. I paste erroneous hyphenations into my clipboard. Once I've scanned the hyphenated word list I deal with the erroneous sublist I created in my clipboard with regular S&R.

BR
BetterRed is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Remove "Personal Document" tag WITHOUT losing annotations? koudis226 Amazon Kindle 9 05-14-2014 05:38 PM
regex "?" with higher priority greedynes then the previous "?" ? user743 Editor 1 05-04-2014 03:57 AM
[Old Thread] Regex "FN LN" to "LN, FN" & reverse? unboggling Library Management 19 11-20-2013 06:44 AM
Content Disabling "Personal Document Charge exceeded" warning? runningwithbulls Amazon Kindle 10 10-21-2010 10:03 AM
Brother Japan to present an A5 "Electronic Document Reader" igorsk News 20 09-05-2008 08:20 AM


All times are GMT -4. The time now is 10:18 PM.


MobileRead.com is a privately owned, operated and funded community.