MobileRead Forums - View Single Post

kovidgoyal · 10-09-2007, 10:11 PM

For a txt file it's as simple as

Code:

import re
from hyphenate import hyphenate_word as hyphenate
src = open('file', 'rb').read()
result = re.sub('\S+', lambda match : u'\u00ad'.join(hyphenate(match.group())), src)

Actually if you can come up with a regexp that matches only text between tags you can use this technique for HTML as well.

10-09-2007, 10:11 PM	#15
kovidgoyal creator of calibre Posts: 45,439 Karma: 27757438 Join Date: Oct 2006 Location: Mumbai, India Device: Various	For a txt file it's as simple as Code: import re from hyphenate import hyphenate_word as hyphenate src = open('file', 'rb').read() result = re.sub('\S+', lambda match : u'\u00ad'.join(hyphenate(match.group())), src) Actually if you can come up with a regexp that matches only text between tags you can use this technique for HTML as well. Last edited by kovidgoyal; 10-09-2007 at 10:15 PM.