Without any example files it's hard to say...
You can try to cleanse the file by converting the empty lines using search and replace constructs and/or converting to html each paragraph block with <p> which inherently ignores whitespace.
The strange characters you see in place of apostrophes is a character encoding problem i.e. UTF-8 vs ANSI vs dos text.
I try to always work in html and try to avoid literal characters for extended dos characters and use their equivalent html codes i.e. © for ©
Your best tool would be a powerful text editor like textplus or notepad+ and some knowledge of regex's (regular expression pattern matching)!