Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > Miscellaneous > Archive > Mobile Sites

Notices

 
 
Thread Tools Search this Thread
Old 09-09-2004, 09:40 PM   #1
aranpura
Member
aranpura began at the beginning.
 
aranpura's Avatar
 
Posts: 12
Karma: 34
Join Date: Sep 2004
Location: San Francisco, California
Device: Tungsten T3
The Nation and MediaChannel.org

Progressive politics news for the small screen!

These are both sites I read daily, and I finally got around to writings Perl scripts to Palm-ize them. I use iSilo and will include my setup notes, but any spidering program (Plucker, etc) will work.

The Nation. This is a great magazine with quite a lot of content -- a beefy read, but an extremely worthwhile one.
http://aranpura.beevomit.org/palm/parseNation.pl
Link depth 1, follow off-site links, no images, process tables but include only 1 level up from the innermost, unfold top-level tables, and ignore pixel width specifications
The News Dissector. Danny Schacter's wonderful media critique blog, a great way to stay on top of both the news and the news outlets. Let's remain a well-informed segment of the electorate. Wield your PDAs mercilessly!

--Ashish Ranpura.
http://www.ranpura.com
aranpura is offline  
Old 09-10-2004, 05:57 AM   #2
Alexander Turcic
Fully Converged
Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.
 
Alexander Turcic's Avatar
 
Posts: 18,175
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
aranpura,

Great work. I am happy you discovered the wonderful art of site scraping Can be a lot of fun. One suggestion: The Nation also offers a print version of each article. So when you parse the main content page, simply replace all article links with the print version.

E.g.
http://www.thenation.com/doc.mhtml?i=20040927&s=legum
becomes
http://www.thenation.com/docprint.mh...040927&s=legum

(a small regex replacing /doc\.mhtml with /docprint\.mhtml should be sufficient.)

Anyways, keep up the great work!
Alexander Turcic is offline  
Advert
Old 09-10-2004, 06:07 AM   #3
ignatz
mechanoholic
ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.
 
ignatz's Avatar
 
Posts: 582
Karma: 1000217
Join Date: Mar 2004
Location: Sarasota, FL
Device: Nook STR/iPhone 4S/EVO 4G
Quote:
One suggestion: The Nation also offers a print version of each article. So when you parse the main content page, simply replace all article links with the print version.
I would request the same thing. This would make it much easier for me. Also, could you post your perl code for these sites so we can all bask in your cleverness? Have you tried playing with sitescooper, which does a similar job?
ignatz is offline  
Old 09-10-2004, 09:56 AM   #4
sUnShInE
Drama Queen
sUnShInE is a marvel to beholdsUnShInE is a marvel to beholdsUnShInE is a marvel to beholdsUnShInE is a marvel to beholdsUnShInE is a marvel to beholdsUnShInE is a marvel to beholdsUnShInE is a marvel to beholdsUnShInE is a marvel to beholdsUnShInE is a marvel to beholdsUnShInE is a marvel to beholdsUnShInE is a marvel to behold
 
sUnShInE's Avatar
 
Posts: 784
Karma: 11712
Join Date: Nov 2002
Location: United States
Device: Palm Tungsten T|T3
Tip: If you have an interest in progressive American politics you should check out the Progress Report's mobile version. Two of the writers wrote the (cover story) of The Nation that Alex linked above.

Last edited by sUnShInE; 09-10-2004 at 09:58 AM.
sUnShInE is offline  
Old 09-10-2004, 01:09 PM   #5
aranpura
Member
aranpura began at the beginning.
 
aranpura's Avatar
 
Posts: 12
Karma: 34
Join Date: Sep 2004
Location: San Francisco, California
Device: Tungsten T3
suggestions incorporated

I didn't even think to look for a print edition of the articles! Thanks for the suggestion -- I've incorporated it into the script (still at the same link, http://aranpura.beevomit.org/palm/parseNation.pl).

And thanks to sUnShInE for the tip on the Progress Report -- I've added it to my daily reading list.
aranpura is offline  
Advert
Old 09-10-2004, 01:10 PM   #6
aranpura
Member
aranpura began at the beginning.
 
aranpura's Avatar
 
Posts: 12
Karma: 34
Join Date: Sep 2004
Location: San Francisco, California
Device: Tungsten T3
perl code for parseNation.pl

#!/usr/pkg/bin/perl -w

#----------------------------------------------------------------
#--- this script depends on Perl modules installed at Freeshell
#----------------------------------------------------------------


$URL = "http://www.thenation.com/";

use LWP::Simple;
unless (defined ($webPage = get $URL)) {
die "could not get $URL\n";
}

#--- cut the header information and all the left/top nav
$webPage =~ s/<head>.*?(<div class="tnhphed">)/$1/sg;

#--- cut the middle search fields
$webPage =~ s/(<!-- little ones -->.*?)(<\/table>).*?(<div class="tns2")/<br>$1$2<br><br>$3/s;

#--- cut the middle advertisement
$webPage =~ s/<div align="center".*?(<div class="tns2")/$1/s;

#--- cut everything at the bottom
$webPage =~ s/<td width="1" bgcolor="#cccccc".*(<\/body>)/$1/s;

#--- fix any partial URLs
$webPage =~ s/"doc\.mhtml/"http:\/\/www\.thenation\.com\/doc\.mhtml/g;

#--- link all articles to the print version instead of the online version
$webPage =~ s/doc\.mhtml/docprint\.mhtml/g;

#--- cut all links to full issues
$webPage =~ s/<a href="\/issue.mhtml.*?>(.*?)<\/a>/$1/g;

#--- return the result to a browser CGI query
print "Content-type:text/html\n\n";
print "<font color=\"#CC0000\" face=\"serif\" size=\"5\"><strong>THE NATION</strong></font><br><br>";
print $webPage;
aranpura is offline  
Old 09-10-2004, 01:12 PM   #7
aranpura
Member
aranpura began at the beginning.
 
aranpura's Avatar
 
Posts: 12
Karma: 34
Join Date: Sep 2004
Location: San Francisco, California
Device: Tungsten T3
perl code for parseMediachannel.pl

#!/usr/pkg/bin/perl -w

#----------------------------------------------------------------
#--- this script depends on Perl modules installed at Freeshell
#----------------------------------------------------------------


$URL = "http://www.newsdissector.org/weblog/";

use LWP::Simple;
unless (defined ($webPage = get $URL)) {
die "could not get $URL\n";
}

#--- cut the header information and all the left/top nav
$webPage =~ s/<head>.*?(<font color="#CC0000")/$1/s;

#--- cut the right nav boxes and ads
$webPage =~ s/<table.*?bgcolor="#FFCC00".*(<font.*?color="#00000 0">)/$1/s;

#--- cut the footer information
$webPage =~ s/(Posted by.*?<\/font>).*(<\/body>)/$1\n$2/s;


#--- return the result to a browser CGI query
print "Content-type:text/html\n\n";
print $webPage;
aranpura is offline  
Old 09-10-2004, 01:45 PM   #8
ignatz
mechanoholic
ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.
 
ignatz's Avatar
 
Posts: 582
Karma: 1000217
Join Date: Mar 2004
Location: Sarasota, FL
Device: Nook STR/iPhone 4S/EVO 4G
Thanks for making the change to the printable pages and thanks for posting the scripts. This is great stuff!
ignatz is offline  
Old 09-10-2004, 03:49 PM   #9
Alexander Turcic
Fully Converged
Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.
 
Alexander Turcic's Avatar
 
Posts: 18,175
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
Love that spirit. Thanks for sharing with us, Ashish!
Alexander Turcic is offline  
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Kindle Nation J.M. Pierce Amazon Kindle 3 08-21-2010 02:32 PM
Kindle Nation Daily - where is it? Mike L Amazon Kindle 4 12-30-2009 11:51 AM
Biography Nation, Carry: The Use and Need of the Life of Carry A. Nation. IMP. 08 Nov 2007 RWood IMP Books 0 11-08-2007 01:05 PM
Biography Nation, Carry: The Use and Need of the Life of Carry A. Nation. 08 Nov 2007 RWood Kindle Books 0 11-08-2007 01:04 PM
Biography Nation, Carry: The Use and Need of the Life of Carry A. Nation. 08 Nov 2007 RWood BBeB/LRF Books 0 11-08-2007 01:03 PM


All times are GMT -4. The time now is 06:50 PM.


MobileRead.com is a privately owned, operated and funded community.