Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 04-20-2011, 04:03 AM   #1
xtepn01
Junior Member
xtepn01 began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Apr 2011
Device: Kindle3
Accessing RSS feed

Hi,
I've created lots of recipes and I think I understand the basics of recipe preparations. However, lately I've stumbled upon two problems:

1) One recipe is based solely on RSS data, no parsing of linked article HTML (using use_embedded_content=True)
Everything works alright, however I would like to add an image into the article content. Unfortunately this image is not part of RSS content HTML but rather specified as RSS tag <enclosure>. I understand, that I can use preprocess_html method to append custom <img> tag to content. However, I'm unable to find a way to retrieve additional RSS tags. I understand, that feedparser is used to transform regular RSS tags to article data, but I don't know how (and where) to parse additional RSS tags. Any pointers?

2) populate_article_metadata is a great method, but what about the other way? I would like to access article metadata from methods such as preprocess_html/postprocess_html but I cant find a way to get the currently processed article object. The reason i need this, is that I want to add an author byline into the article content (even though the author is properly filled in article metadata, it does not show in Kindle when reading the article - is it supposed to be seen there? Where? Below title? At the end?).

I've read many posts and recipes posted here, but i cant find an answer for any of those. Thanks a lot for any help.
xtepn01 is offline   Reply With Quote
Old 04-21-2011, 07:56 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by xtepn01 View Post
Hi,
I've created lots of recipes and I think I understand the basics of recipe preparations. However, lately I've stumbled upon two problems:

1) One recipe is based solely on RSS data, no parsing of linked article HTML (using use_embedded_content=True)
Everything works alright, however I would like to add an image into the article content. Unfortunately this image is not part of RSS content HTML but rather specified as RSS tag <enclosure>. I understand, that I can use preprocess_html method to append custom <img> tag to content. However, I'm unable to find a way to retrieve additional RSS tags. I understand, that feedparser is used to transform regular RSS tags to article data, but I don't know how (and where) to parse additional RSS tags. Any pointers?

2) populate_article_metadata is a great method, but what about the other way? I would like to access article metadata from methods such as preprocess_html/postprocess_html but I cant find a way to get the currently processed article object. The reason i need this, is that I want to add an author byline into the article content (even though the author is properly filled in article metadata, it does not show in Kindle when reading the article - is it supposed to be seen there? Where? Below title? At the end?).

I've read many posts and recipes posted here, but i cant find an answer for any of those. Thanks a lot for any help.
These are excellent questions, well above the run-of-the-mill type questions here. They deserve an answer. Unfortunately, I've only got about 60 seconds now. If I can get a bit of time, I'll give you what info I have, but it's not much. Kovid may have to advise us both and we'll both learn something. I'll try to post later today.
Starson17 is offline   Reply With Quote
Old 04-21-2011, 10:06 AM   #3
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Starson17 View Post
If I can get a bit of time, I'll give you what info I have
Quote:
1) One recipe is based solely on RSS data, no parsing of linked article HTML (using use_embedded_content=True)
Everything works alright, however I would like to add an image into the article content. Unfortunately this image is not part of RSS content HTML but rather specified as RSS tag <enclosure>. I understand, that I can use preprocess_html method to append custom <img> tag to content. However, I'm unable to find a way to retrieve additional RSS tags. I understand, that feedparser is used to transform regular RSS tags to article data, but I don't know how (and where) to parse additional RSS tags. Any pointers?
I can't answer this. If I really needed to do it this way, I'd go digging in the code or ask Kovid. I can't recall any recipes with images from the RSS feed page, but I've thought about the issue a couple of times. If you can parse out a link to the image, perhaps you could scrape the RSS page with parse_index, and use the image on the Article page. I realize you're using use_embedded_content=True, but usually it's possible to turn that off and grab the actual Article page. That's my usual approach when facing a nice RSS feed page with images. The images are usually also on the Article page.
You may want to look at the FeedParser page and the feed-image info in combination with reviewing the Calibre code and its implementation of the feed parser:
http://www.feedparser.org/docs/refer...eed-image.html

Quote:
2) populate_article_metadata is a great method, but what about the other way? I would like to access article metadata from methods such as preprocess_html/postprocess_html but I cant find a way to get the currently processed article object. The reason i need this, is that I want to add an author byline into the article content
This is one of those questions that should have a simple answer, but again, I don't know it. I suppose I'd try saving the relevant article metadata globally in an enumerated list using parse_feeds and access it with a counter that I decrement each time preprocess/postprocess runs. Something like this?:

Code:
    def parse_feeds(self):
        feeds = BasicNewsRecipe.parse_feeds(self)
        for a, curfeed in enumerate(feeds):
            for b, curarticle in enumerate(curfeed.articles):
              {grab a, b, curfeed, curarticle and what's needed to use later}
        return feeds
Perhaps Kovid can point us to better methods to move data to/from articles and the RSS feed page.
Starson17 is offline   Reply With Quote
Old 04-21-2011, 10:36 AM   #4
xtepn01
Junior Member
xtepn01 began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Apr 2011
Device: Kindle3
Thanks Starson17 for the reply.

Unfortunately, I don't have that much Python experience (coming from .Net background) to do magic, but I guess I don't demand that much. It would be very convenient to have some global variables, which would always hold the context of currently processed article. Context could hold the source RSS item tag as well as already parsed metadata. This way, all would be well.

Currently you're dependent on the method parameters passed in and if the required object is not given to the method, you're stuck. Having a context at hand would solve all this and it would greatly simplify the recipe code. Basically for each article, you'll have a lifecycle (which you already have sort of) and you could plug your custom logic at each step and modify the context instead of recieving and returning objects.

Your solution is clever workaround for the lack of context, but a more robust way would be nice. This counter decrementing seems to rely too much on articles processed in proper order (as returned by the iterator) which I've learned the hard way is not always the case and can't be taken for granted. For example if parsing of article fails for whatever reason and is caught and handled, the recipe continues on the next article. In such case your entire code loses consistency as the counter is off.

The reasons I want to solely rely on RSS in this example is that I'm doing centralized processing and redistribution of the content. The content owner requires to parse only the RSS feed, article parsing would be considered copyright infringement.

Anyway, Calibre is a great piece of software and I'm sure I'll find a way to solve this. The worst case scenario is missing pictures

If Kovid wants to add his 2c I'm all ears
xtepn01 is offline   Reply With Quote
Old 04-21-2011, 10:48 AM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,596
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Use populate_article_metadata to insert the image into the soup. And override the parsing of the feed to insert the image into the Article object (which means implementing parse_feeds in your recipe)
kovidgoyal is online now   Reply With Quote
Old 04-21-2011, 11:00 AM   #6
xtepn01
Junior Member
xtepn01 began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Apr 2011
Device: Kindle3
Thanks a lot. I'll try to do that. Nice trick about modifying the soup from populate_article_metadata. Now I only have to figure out a way, how to retrieve feed item in parse_feeds. If I create a working code, I'll post it here.
xtepn01 is offline   Reply With Quote
Old 04-21-2011, 11:56 AM   #7
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kovidgoyal View Post
Use populate_article_metadata to insert the image into the soup.
Let me make sure I understand this. The soup in populate_article_metadata is the article soup, so it can be changed there, to change the article, while the other feed metadata and/or image is available.

There's code in another recipe (credit_slips) that did this sort of thing:

Code:
    def populate_article_metadata(self, article, soup, first):
        h2 = soup.find('h2')
        h2.replaceWith(h2.prettify() + '<p><em>Posted by ' + article.author + '</em></p>')
Quote:
And override the parsing of the feed to insert the image into the Article object (which means implementing parse_feeds in your recipe)
And here you simply add the image into the Article object to get it into the index page for this feed. I'm going to have to play with this one.
Kovid: thanks for the help.
Starson17 is offline   Reply With Quote
Old 04-21-2011, 12:09 PM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,596
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Yes, there's no reason you have to use populate_article_metadata only to "populate article metadata"
kovidgoyal is online now   Reply With Quote
Old 04-21-2011, 03:30 PM   #9
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kovidgoyal View Post
Use populate_article_metadata to insert the image into the soup.
This was easy.

Quote:
And override the parsing of the feed to insert the image into the Article object (which means implementing parse_feeds in your recipe)
I had trouble with this. I couldn't get any HTML or images into the Article object. My tags got changed to &lt;and &gt;. What am I missing?
Starson17 is offline   Reply With Quote
Old 04-21-2011, 03:38 PM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,596
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I was suggesting you stick them on as an extra attribute. Don't put them into the description. So create an Article as normal and then use

article.my_extra_stuff = whatever

And my_extra_stuff should then be available in the populate method when you need it.
kovidgoyal is online now   Reply With Quote
Old 04-21-2011, 04:04 PM   #11
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kovidgoyal View Post
I was suggesting you stick them on as an extra attribute. Don't put them into the description. So create an Article as normal and then use
article.my_extra_stuff = whatever
And my_extra_stuff should then be available in the populate method when you need it.
OK, but does this permit an image to be added to the article list page of a recipe created e-book? I've seen many feeds with images. If I want those images I've always put the image on the article page. I've always wondered if it was possible to have the ebook page that links to the articles include the feed page images, just like the RSS feed page that links to the articles has them.
Starson17 is offline   Reply With Quote
Old 04-21-2011, 04:10 PM   #12
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,596
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
For that you have to override the feeds2index method.
kovidgoyal is online now   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Books through an RSS feed Canadiancynic Calibre 1 02-06-2011 06:13 AM
RSS Feed timezone Feedback 8 01-02-2010 06:55 PM
RSS Feed Question PGP_Protector Sony Reader 1 01-26-2009 12:12 AM
Newsimages.com RSS Feed Alexander Turcic Lounge 0 08-27-2004 03:25 PM


All times are GMT -4. The time now is 09:37 PM.


MobileRead.com is a privately owned, operated and funded community.