Ebooks, even retail ones, are often badly formatted, to the point that it detracts from the reading experience. And while editing programs exist, they fall short of the task. Do you have the time or patience to trawl through thousands of paragraphs, mangled html and css (div-div-div-p-span-span-br-span...)? A close friend of mine, 'burbleburbleburble' wrote a prototype of this program a while back. While he has more or less retired from it, I have taken up the project, and it has finally reached an acceptable level of sophistication.
As the situation currently stands: eBookCleaner works for me, and is highly customized for my needs. Still, I am interested and willing to work with the community to improve it as per everyone's needs. But I am looking for this to be a collaboration effort - I personally don't have the time to write and brainstorm every step of the way! Below is the source code, and I look forward to working with anyone who is interested in helping write a calibre plugin interface (again, I tried, but I simply do not have the time to learn how to do it and then properly implement it). [In the meantime, there is an updated standalone version at www.ebookcleaner.com.]
(Note: anyone who is interested in somehow using the source, there is a 'help.txt' file in the documents subfolder. Somewhat sketchy though)
I often don't make it to the internet more that a few time of week... please have patience when waiting for my response.