Quote:
Originally Posted by phossler
There's a RegEx function called 'SplitWords' that tries to divide using the dictionary
https://www.mobileread.com/forums/sh...ht=split+words
Post #9 by the master
I've never used it so let me know how it goes
Code:
import regex
from calibre import replace_entities, prepare_string_for_xml
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
def fix_word(m):
word = m.group()
if dictionaries.recognized(word):
return word
for i in xrange(1, len(word) - 1):
a, b = word[:i], word[i:]
if dictionaries.recognized(a) and dictionaries.recognized(b):
return a + ' ' + b
return word
text = replace_entities(match.group(1))
text = regex.sub(r'\b\w+\b', fix_word, text, flags=regex.VERSION1)
text = prepare_string_for_xml(text)
return '>' + text + '<'
|
Of course I have to replace
Code:
return '>' + text + '<'
with
but, using the search string
Code:
\b(\w+)\b(?![^<>{}]*[>}])
it only works with words that are in the dictionary - i.e. they are left unchanged. But when I encounter a word that is not in the dic, I get an error
Code:
NameError: name 'xrange' is not defined