View Single Post
Old 04-21-2011, 10:06 AM   #3
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Starson17 View Post
If I can get a bit of time, I'll give you what info I have
Quote:
1) One recipe is based solely on RSS data, no parsing of linked article HTML (using use_embedded_content=True)
Everything works alright, however I would like to add an image into the article content. Unfortunately this image is not part of RSS content HTML but rather specified as RSS tag <enclosure>. I understand, that I can use preprocess_html method to append custom <img> tag to content. However, I'm unable to find a way to retrieve additional RSS tags. I understand, that feedparser is used to transform regular RSS tags to article data, but I don't know how (and where) to parse additional RSS tags. Any pointers?
I can't answer this. If I really needed to do it this way, I'd go digging in the code or ask Kovid. I can't recall any recipes with images from the RSS feed page, but I've thought about the issue a couple of times. If you can parse out a link to the image, perhaps you could scrape the RSS page with parse_index, and use the image on the Article page. I realize you're using use_embedded_content=True, but usually it's possible to turn that off and grab the actual Article page. That's my usual approach when facing a nice RSS feed page with images. The images are usually also on the Article page.
You may want to look at the FeedParser page and the feed-image info in combination with reviewing the Calibre code and its implementation of the feed parser:
http://www.feedparser.org/docs/refer...eed-image.html

Quote:
2) populate_article_metadata is a great method, but what about the other way? I would like to access article metadata from methods such as preprocess_html/postprocess_html but I cant find a way to get the currently processed article object. The reason i need this, is that I want to add an author byline into the article content
This is one of those questions that should have a simple answer, but again, I don't know it. I suppose I'd try saving the relevant article metadata globally in an enumerated list using parse_feeds and access it with a counter that I decrement each time preprocess/postprocess runs. Something like this?:

Code:
    def parse_feeds(self):
        feeds = BasicNewsRecipe.parse_feeds(self)
        for a, curfeed in enumerate(feeds):
            for b, curarticle in enumerate(curfeed.articles):
              {grab a, b, curfeed, curarticle and what's needed to use later}
        return feeds
Perhaps Kovid can point us to better methods to move data to/from articles and the RSS feed page.
Starson17 is offline   Reply With Quote