For a txt file it's as simple as
Code:
import re
from hyphenate import hyphenate_word as hyphenate
src = open('file', 'rb').read()
result = re.sub('\S+', lambda match : u'\u00ad'.join(hyphenate(match.group())), src)
Actually if you can come up with a regexp that matches only text between tags you can use this technique for HTML as well.