Quote:
Originally Posted by slowsmile
Using BeautifulSoup, here's a quick way to remove all garbage proprietary data from an html fille.
|
Nice example of deleting attributes from tags with bs4, but why would "id" or "lang" attributes be considered garbage (or proprietary)? Removing "id", for instance, could break a whole bunch of links in files (html toc and ncx included). Seems a very odd attribute to want to nuke ("name" should probably be converted to "id" to prevent any possible link breakage, as well).