Is there a way to preprocess a feed XML

pietvo · 12-31-2011, 10:51 AM

I am writing a recipe for a newspaper that mixes up encodings in its RSS feeds. Meaning that they use two encodings in one file: iso-8859-1 and utf-8. Calibre most of the time decodes the text correctly, but not always. I would like to preprocess the feeds and recode the iso-8859-1 parts to utf-8 before processing them (I think I know which parts have which encoding).

In calibre/src/calibre/web/feeds/news.py it says in parse_feeds:

parsed_feeds.append(feed_from_xml(f.read(),

Basically I would like to process between the read() and the feed_from_xml(). I could copy-paste parse_feeds to my recipe and change that, but IMHO that violates OO principles (DRY). The other way that I can think of is monkey-patching or subclassing the browser class and/or related classes but that probably is even uglier. Is there a better way? Or can I request this as a feature?

kovidgoyal · 12-31-2011, 12:34 PM

One way to do it is to download the feeds yourself in the recipe, fix them, save them to temp files on disk and return the file:// url

12-31-2011, 10:51 AM	#1
pietvo Reader Posts: 520 Karma: 24612 Join Date: Aug 2009 Location: Utrecht, NL Device: Kobo Aura 2, iPhone, iPad	Is there a way to preprocess a feed XML I am writing a recipe for a newspaper that mixes up encodings in its RSS feeds. Meaning that they use two encodings in one file: iso-8859-1 and utf-8. Calibre most of the time decodes the text correctly, but not always. I would like to preprocess the feeds and recode the iso-8859-1 parts to utf-8 before processing them (I think I know which parts have which encoding). In calibre/src/calibre/web/feeds/news.py it says in parse_feeds: parsed_feeds.append(feed_from_xml(f.read(), Basically I would like to process between the read() and the feed_from_xml(). I could copy-paste parse_feeds to my recipe and change that, but IMHO that violates OO principles (DRY). The other way that I can think of is monkey-patching or subclassing the browser class and/or related classes but that probably is even uglier. Is there a better way? Or can I request this as a feature?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Preprocess cbz before sending to Kindle	mhkey	Conversion	3	07-02-2011 06:15 PM
xml epub	yuxi_kelly	ePub	4	05-13-2011 10:52 AM
Preprocess or Postprocess epub Conversion?	robert_epub	Calibre	1	03-20-2010 11:12 PM
Why xml??	real_yoni	Sony Reader Dev Corner	1	01-20-2009 11:45 AM
PRS-500 Available XML commands	johnmcelfresh	Sony Reader Dev Corner	0	08-18-2007 01:55 PM

12-31-2011, 12:34 PM	#2
kovidgoyal creator of calibre Posts: 45,345 Karma: 27182818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	One way to do it is to download the feeds yourself in the recipe, fix them, save them to temp files on disk and return the file:// url

Advert