View Single Post
Old 10-09-2007, 09:13 PM   #12
Goshzilla
Zealot
Goshzilla has a complete set of Star Wars action figures.Goshzilla has a complete set of Star Wars action figures.Goshzilla has a complete set of Star Wars action figures.Goshzilla has a complete set of Star Wars action figures.
 
Posts: 104
Karma: 346
Join Date: Oct 2007
Device: Rocket Ebook 1150
Quote:
Originally Posted by kovidgoyal View Post
Use beatifulsoup + python hyphenate shouldn't need more than a 100 line script.
How exactly would this work. I want to be able to preserve the original tags, styles etc. Except all the text within the body of the work will be hyphenated using soft hyphens.

The way I'm reading the way hyphenate.py file is written, it meerly returns an array with each substring of text where a hyphen can go, I suppose then I would have to write a for-loop based on the length of that array to create a string with the softhyphen apended something like

if say I took that a=hyphenate_word(perfect)
a=['per', 'fect']
string=a[0]
then I would want a for loop iterating for i=from 1 to length(a)-1
## something to append to string such that it appends "$$softhyphen$$+a[i]"
(writing it that way so that the softhyphen is never inserted at the end of a word)

now I just need to figure out how to use this BeautifulSoup script to figure out how to get the text inbetween the <body></body> tags, while preserving formatting tags like <p> and <br>, I don't want to completely drop the formatting, I only want those words to be modified then placed back into the html file.

Last edited by Goshzilla; 10-09-2007 at 09:15 PM.
Goshzilla is offline   Reply With Quote