If you know any programming/scripting languages, I'd just suggest making a tool to do exactly what you want. I can also second what JSWolf said about the Random House books. Those are actually pretty easy to automatically clean up, as every book has the same layout.
Another way I reduce the size is to reencode the cover images. This usually cuts a significant amount off the size.
My books, after stripping the unnecessary fonts and reencoding the cover image usually end up being ~300kb each.
|