Quote:
Originally Posted by Achilles
OK, this is driving me bananas. I've spent the past two days off and on trying to clean up some RTF books into html and then convert them to LRF for viewing on my PRS-505. I've saved the RTF out to html using the microsoft word "filtered html" option and then run it through HTML tidy to clean things up.
|
Just a comment. If you're using Tidy, it seems to work much better if you save as HTML (in its whole tag-soup mess glory) and not as filtered HTML. I was doing something similar last night on OCR'ed text (using the built-in function in Notepad++) and I noticed Tidy was able to clean the "messy" HTML much better as opposed to the filtered HTML.