View Single Post
Old 07-23-2020, 05:28 AM   #202
CalibUser
Addict
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 203
Karma: 62362
Join Date: Jul 2015
Device: Sony
@democrite:

Quote:
Originally Posted by democrite View Post
I'm sorry for the mixup.
No worries.

Quote:
Originally Posted by democrite View Post
I would guess then the recent changes do not compile a list of unhyphenated terms from the source EPUB and then check for hyphens followed by a space to see if such should be corrected?
Correct - recent changes compile a list of hyphenated words from the source EPUB.

Quote:
Originally Posted by democrite View Post
As for what the plugin primary does, check hyphenated words and remove the hyphen if such a term is in a dictionary, I am not sure why people would want such a thing. OCR apps as far as I know in the years that I've used them do not make such errors.
The plugin is designed for as many different OCR readers as possible; some OCR software can hyphenate words that are not normally hyphenated. One feature of the plugin is to examine hyphenated words and find out if, when removing the hyphen and joining the two separate words together, the word that is formed exists in the Hunspell dictionary. If it does, the plugin assumes that the hyphenated word should not be hyphenated and replaces it with the non-hyphenated version.

I find this is particularly useful with older publications that include hyphens where we would not use them now. For example 'today' in older books/magazines appears as 'to-day'; the hyphen in this word is not used in modern texts and so my plugin would reduce the word to 'today'. Unfortunately, the plugin can remove hyphens where these need to be retained, so the latest version of the plugin give the options of adding hyphenated words to a list of hyphenated words in which the hyphen must be retained. This also means that, if for example, one wants to keep the original format (with hyphens) of the scanned text, one could create a file of these words with the latest version of the plugin, so that the hyphen in, for example, 'to-day, can be retained for historical reasons.

There is a fairly simple solution to removing spaces after hyphens using Sigil's own search/replace facility. Use:

Find: [ ]?-[ ]?

Replace: -

This will remove spaces around hyphenated words. You could add this to Sigil Saved Searches so that you can retrieve it when you need it.

I did not include code to do this in the plugin because some books use (perhaps incorrectly) the normal hyphen with spaces in front and behind the hyphen in the text on purpose.
CalibUser is offline   Reply With Quote