Quote:
Originally Posted by unboggling
|
I did a quick look at the markdown but... does it maintain the italics and bold settings? Didn't notice that.
The freebie tool (not around that I could find but still works well) HTML Book Fixer, strips the excess spans BUT it also manages to remove the italics if they are in a span. Most irritating.
With excess nested spans it is darn near impossible to find the matching open / close tags that refer to italics using regex and a royal pain to "eyeball" the italics in the original.
I don't know why modern word processors don't allow the option to clean up the underlying code that is used to create pdf and html files. The main reason I find pdf files so hard to clean up is because most were created in a wysiwyg program. From the underlying code I get in the html it is usually word or a word clone that uses the horrid "<p class=MsoNormal><span style='mso-fareast-font-family:"MS Mincho"'>" often skipping the quotes around the class name. (note the font family/name is whatever font the doc used.)
I think all that excess code can lead to problems in conversions when nested too deep. I had one problem caused by not cleaning up a file because I had not noticed that one of the nested div tags was class="chapter" and around the entire chapter and another was class="chapterHead" and around the Chapter whatever.