![]() |
#1 |
Member
![]() Posts: 12
Karma: 34
Join Date: Sep 2004
Location: San Francisco, California
Device: Tungsten T3
|
The Nation and MediaChannel.org
Progressive politics news for the small screen!
These are both sites I read daily, and I finally got around to writings Perl scripts to Palm-ize them. I use iSilo and will include my setup notes, but any spidering program (Plucker, etc) will work. The Nation. This is a great magazine with quite a lot of content -- a beefy read, but an extremely worthwhile one. http://aranpura.beevomit.org/palm/parseNation.pl The News Dissector. Danny Schacter's wonderful media critique blog, a great way to stay on top of both the news and the news outlets. Link depth 1, follow off-site links, no images, process tables but include only 1 level up from the innermost, unfold top-level tables, and ignore pixel width specifications http://aranpura.beevomit.org/palm/parseMediachannel.pl Let's remain a well-informed segment of the electorate. Wield your PDAs mercilessly!Link depth 0, no images --Ashish Ranpura. http://www.ranpura.com |
![]() |
![]() |
#2 |
Fully Converged
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 18,175
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
|
aranpura,
Great work. I am happy you discovered the wonderful art of site scraping ![]() E.g. http://www.thenation.com/doc.mhtml?i=20040927&s=legum becomes http://www.thenation.com/docprint.mh...040927&s=legum (a small regex replacing /doc\.mhtml with /docprint\.mhtml should be sufficient.) Anyways, keep up the great work! |
![]() |
Advert | |
|
![]() |
#3 | |
mechanoholic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 582
Karma: 1000217
Join Date: Mar 2004
Location: Sarasota, FL
Device: Nook STR/iPhone 4S/EVO 4G
|
Quote:
|
|
![]() |
![]() |
#4 |
Drama Queen
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 784
Karma: 11712
Join Date: Nov 2002
Location: United States
Device: Palm Tungsten T|T3
|
Tip: If you have an interest in progressive American politics you should check out the Progress Report's mobile version. Two of the writers wrote the (cover story) of The Nation that Alex linked above.
![]() Last edited by sUnShInE; 09-10-2004 at 09:58 AM. |
![]() |
![]() |
#5 |
Member
![]() Posts: 12
Karma: 34
Join Date: Sep 2004
Location: San Francisco, California
Device: Tungsten T3
|
suggestions incorporated
I didn't even think to look for a print edition of the articles! Thanks for the suggestion -- I've incorporated it into the script (still at the same link, http://aranpura.beevomit.org/palm/parseNation.pl).
And thanks to sUnShInE for the tip on the Progress Report -- I've added it to my daily reading list. |
![]() |
Advert | |
|
![]() |
#6 |
Member
![]() Posts: 12
Karma: 34
Join Date: Sep 2004
Location: San Francisco, California
Device: Tungsten T3
|
perl code for parseNation.pl
#!/usr/pkg/bin/perl -w
#---------------------------------------------------------------- #--- this script depends on Perl modules installed at Freeshell #---------------------------------------------------------------- $URL = "http://www.thenation.com/"; use LWP::Simple; unless (defined ($webPage = get $URL)) { die "could not get $URL\n"; } #--- cut the header information and all the left/top nav $webPage =~ s/<head>.*?(<div class="tnhphed">)/$1/sg; #--- cut the middle search fields $webPage =~ s/(<!-- little ones -->.*?)(<\/table>).*?(<div class="tns2")/<br>$1$2<br><br>$3/s; #--- cut the middle advertisement $webPage =~ s/<div align="center".*?(<div class="tns2")/$1/s; #--- cut everything at the bottom $webPage =~ s/<td width="1" bgcolor="#cccccc".*(<\/body>)/$1/s; #--- fix any partial URLs $webPage =~ s/"doc\.mhtml/"http:\/\/www\.thenation\.com\/doc\.mhtml/g; #--- link all articles to the print version instead of the online version $webPage =~ s/doc\.mhtml/docprint\.mhtml/g; #--- cut all links to full issues $webPage =~ s/<a href="\/issue.mhtml.*?>(.*?)<\/a>/$1/g; #--- return the result to a browser CGI query print "Content-type:text/html\n\n"; print "<font color=\"#CC0000\" face=\"serif\" size=\"5\"><strong>THE NATION</strong></font><br><br>"; print $webPage; |
![]() |
![]() |
#7 |
Member
![]() Posts: 12
Karma: 34
Join Date: Sep 2004
Location: San Francisco, California
Device: Tungsten T3
|
perl code for parseMediachannel.pl
#!/usr/pkg/bin/perl -w
#---------------------------------------------------------------- #--- this script depends on Perl modules installed at Freeshell #---------------------------------------------------------------- $URL = "http://www.newsdissector.org/weblog/"; use LWP::Simple; unless (defined ($webPage = get $URL)) { die "could not get $URL\n"; } #--- cut the header information and all the left/top nav $webPage =~ s/<head>.*?(<font color="#CC0000")/$1/s; #--- cut the right nav boxes and ads $webPage =~ s/<table.*?bgcolor="#FFCC00".*(<font.*?color="#00000 0">)/$1/s; #--- cut the footer information $webPage =~ s/(Posted by.*?<\/font>).*(<\/body>)/$1\n$2/s; #--- return the result to a browser CGI query print "Content-type:text/html\n\n"; print $webPage; |
![]() |
![]() |
#8 |
mechanoholic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 582
Karma: 1000217
Join Date: Mar 2004
Location: Sarasota, FL
Device: Nook STR/iPhone 4S/EVO 4G
|
Thanks for making the change to the printable pages and thanks for posting the scripts. This is great stuff!
|
![]() |
![]() |
#9 |
Fully Converged
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 18,175
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
|
Love that spirit. Thanks for sharing with us, Ashish!
|
![]() |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Kindle Nation | J.M. Pierce | Amazon Kindle | 3 | 08-21-2010 02:32 PM |
Kindle Nation Daily - where is it? | Mike L | Amazon Kindle | 4 | 12-30-2009 11:51 AM |
Biography Nation, Carry: The Use and Need of the Life of Carry A. Nation. IMP. 08 Nov 2007 | RWood | IMP Books | 0 | 11-08-2007 01:05 PM |
Biography Nation, Carry: The Use and Need of the Life of Carry A. Nation. 08 Nov 2007 | RWood | Kindle Books | 0 | 11-08-2007 01:04 PM |
Biography Nation, Carry: The Use and Need of the Life of Carry A. Nation. 08 Nov 2007 | RWood | BBeB/LRF Books | 0 | 11-08-2007 01:03 PM |