Ok, I can see that you have issues with your punctuation. Resolving that is not always easy btw, since for some cases it is not a case of one s&r string fits all. I actually don't have these issues, since the HTML Export macro is actually my last step in Word. Before this I run several other macros. One prefix macro to solve a lot of standard issues with ABBYY exports. It fixes linebreaks and so on. Then I run a large S&R macro to fix a lot of formatting and OCR issues which can be changed rather dynamically. Then a macro to fix broken dialogues and at last one to check all accented words (typical OCR errors).
So, for me most of the errors you describe are solved. I haven't shared these macro's, since I am only co-author of these or just plain advisor.
Now, your issue with italics are a little different. Your first issue is solved by importing the HTML into Sigil. The second and third one are quite complex, there can be cases when this is correct. Also I feel that this actually should not be in this macro. This macro is to create clean HTML, not to solve formatting issues.
That being said, a macro could be written to solve these kind of issues. Whether I would have the time, that is another.
|