Downloading previous issues of Newsweek

kbfprivate · 05-07-2009, 03:55 PM

I have been terribly busy the last few weeks and haven't had a chance to even crack open a Newsweek in the last month. How hard is it to modify the Newsweek download script to download the last 4 issues?

I downloaded this week's issue and it looks great

Thanks!
Noah

kovidgoyal · 05-07-2009, 04:59 PM

Shouldn't be too bad if the newsweek website has links to back issues

kbfprivate · 05-07-2009, 05:20 PM

It looks like all I need to do is modify this code:

120
def get_current_issue(self):
121 2598
#from urllib2 import urlopen # For some reason mechanize fails
122
#home = urlopen('http://www.newsweek.com').read()
123
soup = self.index_to_soup('http://www.newsweek.com')#BeautifulSoup(home)
124 1182
img = soup.find('img', alt='Current Magazine')
125
if img and img.parent.has_key('href'):
126 2598
return self.index_to_soup(img.parent['href'])

Can I chane "return self.index_to_soup(img.parent['href'])" to be the URL of a previous issue and then re-run the script?

Thanks!
Noah

kbfprivate · 05-07-2009, 05:21 PM

Actually can't I just comment out everything and just have a return statement? I don't know the comment character

kovidgoyal · 05-07-2009, 05:23 PM

yes you can and the comment character is #

kbfprivate · 05-07-2009, 11:53 PM

Quote:

Originally Posted by kovidgoyal

yes you can and the comment character is #

I tried to change it to:

def get_current_issue(self):
#from urllib2 import urlopen # For some reason mechanize fails
#home = urlopen('http://www.newsweek.com').read()
#soup = self.index_to_soup('http://www.newsweek.com/id/195141')#BeautifulSoup(home)
#img = soup.find('img', alt='Current Magazine')
#if img and img.parent.has_key('href'):
return 'http://www.newsweek.com/id/195141'

But it gives me:

Job: **Fetch news from Newsweek20090504**
**tuple**: ('TypeError', u'find() takes no keyword arguments')
**Traceback**:
Traceback (most recent call last):
File "parallel.py", line 958, in worker
File "parallel.py", line 916, in work
File "C:\Program Files\calibre\library.zip\calibre\ebooks\epub\from _feeds.py", line 66, in main
File "C:\Program Files\calibre\library.zip\calibre\ebooks\epub\from _feeds.py", line 37, in convert
File "calibre\web\feeds\main.pyo", line 152, in run_recipe
File "calibre\web\feeds\news.pyo", line 567, in download
File "calibre\web\feeds\news.pyo", line 691, in build_index
File "c:\docume~1\admini~1\locals~1\temp\calibre_0.5.10 _l6nhsr_recipes\recipe0.py", line 78, in parse_index
TypeError: find() takes no keyword arguments

**Log**:
('TypeError', u'find() takes no keyword arguments')
Traceback (most recent call last):
File "parallel.py", line 958, in worker
File "parallel.py", line 916, in work
File "C:\Program Files\calibre\library.zip\calibre\ebooks\epub\from _feeds.py", line 66, in main
File "C:\Program Files\calibre\library.zip\calibre\ebooks\epub\from _feeds.py", line 37, in convert
File "calibre\web\feeds\main.pyo", line 152, in run_recipe
File "calibre\web\feeds\news.pyo", line 567, in download
File "calibre\web\feeds\news.pyo", line 691, in build_index
File "c:\docume~1\admini~1\locals~1\temp\calibre_0.5.10 _l6nhsr_recipes\recipe0.py", line 78, in parse_index
TypeError: find() takes no keyword arguments

Any ideas?

kbfprivate · 05-07-2009, 11:58 PM

Nevermind, should have been:

return self.index_to_soup('http://www.newsweek.com/id/195141')

Seems to work, except gets current cover, but that is easily fixable. Thanks for a great program!

-Noah

05-07-2009, 03:55 PM	#1
kbfprivate Junior Member Posts: 7 Karma: 10 Join Date: May 2009 Device: ipod touch	Downloading previous issues of Newsweek I have been terribly busy the last few weeks and haven't had a chance to even crack open a Newsweek in the last month. How hard is it to modify the Newsweek download script to download the last 4 issues? I downloaded this week's issue and it looks great Thanks! Noah

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Remove <previous next> from html	schizopolis	Calibre	18	11-18-2010 09:46 PM
PRS-600 New firmware for previous readers?	wolfing	Sony Reader	51	09-06-2010 06:01 PM
Page goes back to previous	haino	More E-Book Readers	2	07-04-2010 04:04 PM
Going back to previous firmware	jusmee	Astak EZReader	31	03-21-2010 11:53 AM
IBSuite v0.1 (previous pi)	caritas	Workshop	0	04-05-2009 10:48 AM

05-07-2009, 04:59 PM	#2
kovidgoyal creator of calibre Posts: 45,569 Karma: 28548962 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Shouldn't be too bad if the newsweek website has links to back issues

05-07-2009, 05:20 PM	#3
kbfprivate Junior Member Posts: 7 Karma: 10 Join Date: May 2009 Device: ipod touch	It looks like all I need to do is modify this code: 120 def get_current_issue(self): 121 2598 #from urllib2 import urlopen # For some reason mechanize fails 122 #home = urlopen('http://www.newsweek.com').read() 123 soup = self.index_to_soup('http://www.newsweek.com')#BeautifulSoup(home) 124 1182 img = soup.find('img', alt='Current Magazine') 125 if img and img.parent.has_key('href'): 126 2598 return self.index_to_soup(img.parent['href']) Can I chane "return self.index_to_soup(img.parent['href'])" to be the URL of a previous issue and then re-run the script? Thanks! Noah

05-07-2009, 05:21 PM	#4
kbfprivate Junior Member Posts: 7 Karma: 10 Join Date: May 2009 Device: ipod touch	Actually can't I just comment out everything and just have a return statement? I don't know the comment character

05-07-2009, 05:23 PM	#5
kovidgoyal creator of calibre Posts: 45,569 Karma: 28548962 Join Date: Oct 2006 Location: Mumbai, India Device: Various	yes you can and the comment character is #

05-07-2009, 11:58 PM	#7
kbfprivate Junior Member Posts: 7 Karma: 10 Join Date: May 2009 Device: ipod touch	Nevermind, should have been: return self.index_to_soup('http://www.newsweek.com/id/195141') Seems to work, except gets current cover, but that is easily fixable. Thanks for a great program! -Noah

Advert

Advert