MobileRead Forums - View Single Post - Cleaning ePubs: automatically, fast and with as many generic rules as possible

DiapDealer · 08-07-2013, 09:01 AM

Quote:

Originally Posted by ibu

And there are even no tools to help an editor with the manual tasks I listed in my examples?
Right, I'm not looking for regular expressions.

Regular expressions represent the vast bulk of my arsenal for cleaning epub markup. But they're not generic. They always have to be tuned/tweaked for each book.

It sounds like you're looking for something that parses xhtml/css in order to clean it. I know of nothing like that: neither of the automatic, or manual assist variety (though Sigil's "Reports" can help you find unused css classes). Most parsers are used to create consistent (albeit usually "cluttered") markup, not clean it.

"Clean" epub code has always been the purview of the ebook's creator. Anybody else will be expected to take time diving in and getting their hands dirty to clean someone else's code. Mainly because it's just not important enough to enough people to warrant the development of such a beast.