So I've been working on this all this morning and I have come across a stopping block.
I can extract text from anything within the <p> tags, but I have a problem trying to write a loop to hyphenate a sentence, the issue is that there are specially defined characters like "”" which is the entry for a right sided double quotation mark. I can strip out the text from a paragraph, but it still leaves symbols like that intact.
update***
I got it done, finally. Though it isn't the most efficient code in the world, but it can now scan through an entire directory for html files, and modify them one by one, inserting hyphens. There is however one issue I'd like to get around. I am hitting a stopping block when I try to edit an html file that is encoded in something other than ascii.
Last edited by Goshzilla; 04-08-2010 at 07:24 PM.
|