Quote:
Originally Posted by Starson17
If I can get a bit of time, I'll give you what info I have
|
Quote:
1) One recipe is based solely on RSS data, no parsing of linked article HTML (using use_embedded_content=True)
Everything works alright, however I would like to add an image into the article content. Unfortunately this image is not part of RSS content HTML but rather specified as RSS tag <enclosure>. I understand, that I can use preprocess_html method to append custom <img> tag to content. However, I'm unable to find a way to retrieve additional RSS tags. I understand, that feedparser is used to transform regular RSS tags to article data, but I don't know how (and where) to parse additional RSS tags. Any pointers?
|
I can't answer this. If I really needed to do it this way, I'd go digging in the code or ask Kovid. I can't recall any recipes with images from the RSS feed page, but I've thought about the issue a couple of times. If you can parse out a link to the image, perhaps you could scrape the RSS page with parse_index, and use the image on the Article page. I realize you're using use_embedded_content=True, but usually it's possible to turn that off and grab the actual Article page. That's my usual approach when facing a nice RSS feed page with images. The images are usually also on the Article page.
You may want to look at the FeedParser page and the feed-image info in combination with reviewing the Calibre code and its implementation of the feed parser:
http://www.feedparser.org/docs/refer...eed-image.html
Quote:
2) populate_article_metadata is a great method, but what about the other way? I would like to access article metadata from methods such as preprocess_html/postprocess_html but I cant find a way to get the currently processed article object. The reason i need this, is that I want to add an author byline into the article content
|
This is one of those questions that should have a simple answer, but again, I don't know it. I suppose I'd try saving the relevant article metadata globally in an enumerated list using parse_feeds and access it with a counter that I decrement each time preprocess/postprocess runs. Something like this?:
Code:
def parse_feeds(self):
feeds = BasicNewsRecipe.parse_feeds(self)
for a, curfeed in enumerate(feeds):
for b, curarticle in enumerate(curfeed.articles):
{grab a, b, curfeed, curarticle and what's needed to use later}
return feeds
Perhaps Kovid can point us to better methods to move data to/from articles and the RSS feed page.