View Single Post
Old 01-25-2008, 04:54 PM   #23
Deputy-Dawg
Groupie
Deputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-books
 
Deputy-Dawg's Avatar
 
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
The Old Man,
You didn't have to wait long; attached is a quick and dirty that will download the first 10 articles in the following Jerusalem Post feed:

Front Page
Israel News
International News
Middle East News
Editorials

kovidgoyal
The last bit of code fixed up the problem with pubdate in the profile for Agenzia Fides.
I still am having some problems with how the summary is being displayed (cosmetic but ugly - various html tags are being displayed. Most notably <b></b> and <br>)

Meanwhile I have start on one for the Christian Science Monitor. And they have one wild way of directing you to the files. The href points to (and later on in a <link></link>) you are pointed to:

http://rss.csmonitor.com/~r/feeds/to...4s01-woaf.html

which resolves to

http://www.csmonitor.com/2008/0124/p04s01-woaf.html

with the print version being at

http://www.csmonitor.com/2008/0124/p04s01-woaf.htm

The rub is that if you change the original address to

http://rss.csmonitor.com/~r/feeds/to...04s01-woaf.htm

it too resolves to the .html file.

At first I thought this was going to be an easy one, the date is in the number 222417173 all we have to do is convert it to ascidate parse out the /2008/0124/ as '/%Y/%m%d/' and build the required address string. Doesn't work the number resolves to 1977 01 18. I can fix it by adding 2001 01 07 as an offset (that may have to be 06). Is that likely to be legitimate? Have I overlooked something.

The Christian Science Monitor also does not return a valid pubdate and unless you set use_pubdate = False you go no where. However in examining the source for the feed there always seems to be two date entries for each article

articlesortdate="0222880260.000000"
articlelocaldate="0222885964.644872"

which seem to be the epochdate of the files. would it not be possible to capture either or both? Can I get at them in my profiles? I am a bit unsure what declarations that would have to be made.
Attached Files
File Type: zip jrpost.py.zip (1.2 KB, 480 views)
Deputy-Dawg is offline   Reply With Quote