View Single Post
Old 04-08-2010, 05:56 PM   #27
Goshzilla
Zealot
Goshzilla has a complete set of Star Wars action figures.Goshzilla has a complete set of Star Wars action figures.Goshzilla has a complete set of Star Wars action figures.Goshzilla has a complete set of Star Wars action figures.
 
Posts: 104
Karma: 346
Join Date: Oct 2007
Device: Rocket Ebook 1150
So I've been working on this all this morning and I have come across a stopping block.

I can extract text from anything within the <p> tags, but I have a problem trying to write a loop to hyphenate a sentence, the issue is that there are specially defined characters like "&rdquo;" which is the entry for a right sided double quotation mark. I can strip out the text from a paragraph, but it still leaves symbols like that intact.


update***

I got it done, finally. Though it isn't the most efficient code in the world, but it can now scan through an entire directory for html files, and modify them one by one, inserting hyphens. There is however one issue I'd like to get around. I am hitting a stopping block when I try to edit an html file that is encoded in something other than ascii.

Last edited by Goshzilla; 04-08-2010 at 07:24 PM.
Goshzilla is offline   Reply With Quote