It might be worth converting to something like text with markdown or textile active. That would eliminate all the HTML tags so it might be easier to manipulate the file. I guess it depends on the complexity of the formatting you need to preserve as to whether that would be a viable route?
|