View Full Version : Article object in preprocess_html

11-18-2010, 07:06 AM
The method preprocess_html() gets a "soup" argument, but I have a situation where the article being fetched is a request for authentication. After doing so, one is expected to re-fetch the URL. Is the URL in the soup object or, better yet, is the article object (with title, URL, description, and date) available in the BasicNewsRecipe object (i.e., self)? I would love to add more attributes (e.g., byline) to the article object and have that available to preprocess_html() so that I can add more stuff to the fetched article. Thanks!

11-18-2010, 11:53 AM
the soup is just downloaded html. Article objects are stored in BasicNewsRecipe (IIRC under self._fetched_articles or something like that). You have access to both the soup and the article object in populate_article_metadata, however popluate_article metadata is called after postprocess_html