It's optimized for downloading websites for conversion to ebooks. Has link filters and recursion level control and a bunch of other features
cleanup is done by regexps, I dont remeber whether the regexps are passed to web2disk or html2lrf, i think it is web2disk, but there may not be a command line interface to it.