Quote:
Originally Posted by kovidgoyal
Use beatifulsoup + python hyphenate shouldn't need more than a 100 line script.
|
How exactly would this work. I want to be able to preserve the original tags, styles etc. Except all the text within the body of the work will be hyphenated using soft hyphens.
The way I'm reading the way hyphenate.py file is written, it meerly returns an array with each substring of text where a hyphen can go, I suppose then I would have to write a for-loop based on the length of that array to create a string with the softhyphen apended something like
if say I took that a=hyphenate_word(perfect)
a=['per', 'fect']
string=a[0]
then I would want a for loop iterating for i=from 1 to length(a)-1
## something to append to string such that it appends "$$softhyphen$$+a[i]"
(writing it that way so that the softhyphen is never inserted at the end of a word)
now I just need to figure out how to use this BeautifulSoup script to figure out how to get the text inbetween the <body></body> tags, while preserving formatting tags like <p> and <br>, I don't want to completely drop the formatting, I only want those words to be modified then placed back into the html file.