View Single Post
Old 11-28-2014, 09:39 PM   #9
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,977
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Here you go, I haven't really tested it, so you might have to adjust it a little:

Code:
import regex
from calibre import replace_entities, prepare_string_for_xml

def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    def fix_word(m):
        word = m.group()
        if dictionaries.recognized(word):
            return word
        for i in xrange(1, len(word) - 1):
            a, b = word[:i], word[i:]
            if dictionaries.recognized(a) and dictionaries.recognized(b):
                return a + ' ' + b
        return word
    text = replace_entities(match.group(1))
    text = regex.sub(r'\b\w+\b', fix_word, text, flags=regex.VERSION1)
    text = prepare_string_for_xml(text)
    return '>' + text + '<'

Use it with the find expression

>([^<]+)<
kovidgoyal is offline   Reply With Quote