04-20-2011, 04:03 AM | #1 |
Junior Member
Posts: 3
Karma: 10
Join Date: Apr 2011
Device: Kindle3
|
Accessing RSS feed
Hi,
I've created lots of recipes and I think I understand the basics of recipe preparations. However, lately I've stumbled upon two problems: 1) One recipe is based solely on RSS data, no parsing of linked article HTML (using use_embedded_content=True) Everything works alright, however I would like to add an image into the article content. Unfortunately this image is not part of RSS content HTML but rather specified as RSS tag <enclosure>. I understand, that I can use preprocess_html method to append custom <img> tag to content. However, I'm unable to find a way to retrieve additional RSS tags. I understand, that feedparser is used to transform regular RSS tags to article data, but I don't know how (and where) to parse additional RSS tags. Any pointers? 2) populate_article_metadata is a great method, but what about the other way? I would like to access article metadata from methods such as preprocess_html/postprocess_html but I cant find a way to get the currently processed article object. The reason i need this, is that I want to add an author byline into the article content (even though the author is properly filled in article metadata, it does not show in Kindle when reading the article - is it supposed to be seen there? Where? Below title? At the end?). I've read many posts and recipes posted here, but i cant find an answer for any of those. Thanks a lot for any help. |
04-21-2011, 07:56 AM | #2 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
|
|
04-21-2011, 10:06 AM | #3 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
You may want to look at the FeedParser page and the feed-image info in combination with reviewing the Calibre code and its implementation of the feed parser: http://www.feedparser.org/docs/refer...eed-image.html Quote:
Code:
def parse_feeds(self): feeds = BasicNewsRecipe.parse_feeds(self) for a, curfeed in enumerate(feeds): for b, curarticle in enumerate(curfeed.articles): {grab a, b, curfeed, curarticle and what's needed to use later} return feeds |
||
04-21-2011, 10:36 AM | #4 |
Junior Member
Posts: 3
Karma: 10
Join Date: Apr 2011
Device: Kindle3
|
Thanks Starson17 for the reply.
Unfortunately, I don't have that much Python experience (coming from .Net background) to do magic, but I guess I don't demand that much. It would be very convenient to have some global variables, which would always hold the context of currently processed article. Context could hold the source RSS item tag as well as already parsed metadata. This way, all would be well. Currently you're dependent on the method parameters passed in and if the required object is not given to the method, you're stuck. Having a context at hand would solve all this and it would greatly simplify the recipe code. Basically for each article, you'll have a lifecycle (which you already have sort of) and you could plug your custom logic at each step and modify the context instead of recieving and returning objects. Your solution is clever workaround for the lack of context, but a more robust way would be nice. This counter decrementing seems to rely too much on articles processed in proper order (as returned by the iterator) which I've learned the hard way is not always the case and can't be taken for granted. For example if parsing of article fails for whatever reason and is caught and handled, the recipe continues on the next article. In such case your entire code loses consistency as the counter is off. The reasons I want to solely rely on RSS in this example is that I'm doing centralized processing and redistribution of the content. The content owner requires to parse only the RSS feed, article parsing would be considered copyright infringement. Anyway, Calibre is a great piece of software and I'm sure I'll find a way to solve this. The worst case scenario is missing pictures If Kovid wants to add his 2c I'm all ears |
04-21-2011, 10:48 AM | #5 |
creator of calibre
Posts: 43,927
Karma: 22669820
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Use populate_article_metadata to insert the image into the soup. And override the parsing of the feed to insert the image into the Article object (which means implementing parse_feeds in your recipe)
|
04-21-2011, 11:00 AM | #6 |
Junior Member
Posts: 3
Karma: 10
Join Date: Apr 2011
Device: Kindle3
|
Thanks a lot. I'll try to do that. Nice trick about modifying the soup from populate_article_metadata. Now I only have to figure out a way, how to retrieve feed item in parse_feeds. If I create a working code, I'll post it here.
|
04-21-2011, 11:56 AM | #7 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
There's code in another recipe (credit_slips) that did this sort of thing: Code:
def populate_article_metadata(self, article, soup, first): h2 = soup.find('h2') h2.replaceWith(h2.prettify() + '<p><em>Posted by ' + article.author + '</em></p>') Quote:
Kovid: thanks for the help. |
||
04-21-2011, 12:09 PM | #8 |
creator of calibre
Posts: 43,927
Karma: 22669820
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Yes, there's no reason you have to use populate_article_metadata only to "populate article metadata"
|
04-21-2011, 03:30 PM | #9 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
|
||
04-21-2011, 03:38 PM | #10 |
creator of calibre
Posts: 43,927
Karma: 22669820
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I was suggesting you stick them on as an extra attribute. Don't put them into the description. So create an Article as normal and then use
article.my_extra_stuff = whatever And my_extra_stuff should then be available in the populate method when you need it. |
04-21-2011, 04:04 PM | #11 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
OK, but does this permit an image to be added to the article list page of a recipe created e-book? I've seen many feeds with images. If I want those images I've always put the image on the article page. I've always wondered if it was possible to have the ebook page that links to the articles include the feed page images, just like the RSS feed page that links to the articles has them.
|
04-21-2011, 04:10 PM | #12 |
creator of calibre
Posts: 43,927
Karma: 22669820
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
For that you have to override the feeds2index method.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Books through an RSS feed | Canadiancynic | Calibre | 1 | 02-06-2011 06:13 AM |
RSS Feed | timezone | Feedback | 8 | 01-02-2010 06:55 PM |
RSS Feed Question | PGP_Protector | Sony Reader | 1 | 01-26-2009 12:12 AM |
Newsimages.com RSS Feed | Alexander Turcic | Lounge | 0 | 08-27-2004 03:25 PM |