View Single Post
Old 11-18-2010, 07:00 AM   #3
stuartweinstein
Junior Member
stuartweinstein began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2010
Device: Kindle
Using get_obfuscated_article is a bit overkill, I think. I've been using self.log(soup.prettify()) in preprocess_html() to see the contents. The problem is that I need the URL to re-fetch after doing the sign-in. The advantage of get_obfuscated_article is that it is passed the URL, but I didn't want to deal with the output file. Instead, I overrode fetch_article() to hold onto the URL so I could grab it inside preprocess_html(). While I imagine this forces me to a single thread, the performance is fine (since it is a daily download at 4am). I'm attaching my solution, but I'll continue to tweak it. As for access to the URL and other article attributes, I'm going to start another thread to ask about that. Thanks for the help so far.
Attached Files
File Type: txt washpostprint.recipe.txt (4.0 KB, 289 views)
stuartweinstein is offline   Reply With Quote