Is there a way to preprocess a feed XML
I am writing a recipe for a newspaper that mixes up encodings in its RSS feeds. Meaning that they use two encodings in one file: iso-8859-1 and utf-8. Calibre most of the time decodes the text correctly, but not always. I would like to preprocess the feeds and recode the iso-8859-1 parts to utf-8 before processing them (I think I know which parts have which encoding).
In calibre/src/calibre/web/feeds/news.py it says in parse_feeds:
parsed_feeds.append(feed_from_xml(f.read(),
Basically I would like to process between the read() and the feed_from_xml(). I could copy-paste parse_feeds to my recipe and change that, but IMHO that violates OO principles (DRY). The other way that I can think of is monkey-patching or subclassing the browser class and/or related classes but that probably is even uglier. Is there a better way? Or can I request this as a feature?
|