View Single Post
Old 03-10-2023, 06:04 PM   #7
DVdm
Enthusiast
DVdm knows what's going on.DVdm knows what's going on.DVdm knows what's going on.DVdm knows what's going on.DVdm knows what's going on.DVdm knows what's going on.DVdm knows what's going on.DVdm knows what's going on.DVdm knows what's going on.DVdm knows what's going on.DVdm knows what's going on.
 
Posts: 31
Karma: 25920
Join Date: Oct 2020
Device: Kobo Aura H2O (mark 5)
Quote:
Originally Posted by phossler View Post
There's a RegEx function called 'SplitWords' that tries to divide using the dictionary

https://www.mobileread.com/forums/sh...ht=split+words
Post #9 by the master

I've never used it so let me know how it goes


Code:
import regex
from calibre import replace_entities, prepare_string_for_xml

def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    def fix_word(m):
        word = m.group()
        if dictionaries.recognized(word):
            return word
        for i in xrange(1, len(word) - 1):
            a, b = word[:i], word[i:]
            if dictionaries.recognized(a) and dictionaries.recognized(b):
                return a + ' ' + b
        return word
    text = replace_entities(match.group(1))
    text = regex.sub(r'\b\w+\b', fix_word, text, flags=regex.VERSION1)
    text = prepare_string_for_xml(text)
    return '>' + text + '<'
Of course I have to replace
Code:
return '>' + text + '<'
with
Code:
return text
but, using the search string
Code:
\b(\w+)\b(?![^<>{}]*[>}])
it only works with words that are in the dictionary - i.e. they are left unchanged. But when I encounter a word that is not in the dic, I get an error
Code:
NameError: name 'xrange' is not defined

Last edited by DVdm; 03-10-2023 at 06:08 PM. Reason: added used search string
DVdm is offline   Reply With Quote