View Single Post
Old 12-31-2011, 10:51 AM   #1
pietvo
Reader
pietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notes
 
pietvo's Avatar
 
Posts: 520
Karma: 24612
Join Date: Aug 2009
Location: Utrecht, NL
Device: Kobo Aura 2, iPhone, iPad
Is there a way to preprocess a feed XML

I am writing a recipe for a newspaper that mixes up encodings in its RSS feeds. Meaning that they use two encodings in one file: iso-8859-1 and utf-8. Calibre most of the time decodes the text correctly, but not always. I would like to preprocess the feeds and recode the iso-8859-1 parts to utf-8 before processing them (I think I know which parts have which encoding).

In calibre/src/calibre/web/feeds/news.py it says in parse_feeds:

parsed_feeds.append(feed_from_xml(f.read(),

Basically I would like to process between the read() and the feed_from_xml(). I could copy-paste parse_feeds to my recipe and change that, but IMHO that violates OO principles (DRY). The other way that I can think of is monkey-patching or subclassing the browser class and/or related classes but that probably is even uglier. Is there a better way? Or can I request this as a feature?
pietvo is offline   Reply With Quote