View Single Post
Old 10-09-2007, 10:11 PM   #15
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,435
Karma: 27757438
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
For a txt file it's as simple as

Code:
import re
from hyphenate import hyphenate_word as hyphenate
src = open('file', 'rb').read()
result = re.sub('\S+', lambda match : u'\u00ad'.join(hyphenate(match.group())), src)
Actually if you can come up with a regexp that matches only text between tags you can use this technique for HTML as well.

Last edited by kovidgoyal; 10-09-2007 at 10:15 PM.
kovidgoyal is offline   Reply With Quote