Shiny New E-Book Gizmo: The Amazon Kindle


View Full Version : The Economist Scoop


Colin Dunstan
03-23-2004, 12:35 PM
URL: http://www.economist.com/
Name: Economist
Description: Economist
AuthorName: Goh Boon Nam
# Version 1.0
# Date updated : 9 Jan 2004

ignatz
03-23-2004, 12:47 PM
This looks great! Economist.com seems to be having some problems just now, but I'm looking forward to adding this to my sites list. Thanks Morpheus.

ignatz
03-23-2004, 11:16 PM
I've been testing this and I seem to get shut down. I get an HTTP GET error with the message "Automatic downloading forbidden". Now that's just not neighborly. Anyone solved this?

Alexander Turcic
03-24-2004, 02:59 AM
Hm I've been studying Sitescooper the past few days (on your advise, ignatz!), but I don't see any option to slowdown the html download process. I think Economist has some kind of mod_bandwidth module running.

stobs
03-28-2004, 05:49 AM
Is it possible to set up the user-agent ID to something like:
Mozilla/3.0 (compatible; AvantGo 5.2; FreeBSD)
perhaps that is the problem?

-S.

Hm I've been studying Sitescooper the past few days (on your advise, ignatz!), but I don't see any option to slowdown the html download process. I think Economist has some kind of mod_bandwidth module running.

Alexander Turcic
03-31-2004, 12:33 PM
Is it possible to set up the user-agent ID to something like:
Mozilla/3.0 (compatible; AvantGo 5.2; FreeBSD)
perhaps that is the problem?

-S.
stobs, it is possible to change the user-agent when you edit a sitescooper file. You can find more info here (http://www.mobileread.com/forums/showpost.php?p=6523&postcount=7).

Alexander Turcic
04-03-2004, 02:42 AM
I've improved the original economics scoop somewhat to exclude unwanted content.

URL: http://www.economist.com/
Name: Economist
Description: Economist
AuthorName: Goh Boon Nam

# General Settings
Active: 1
SizeLimit: 2000
Levels: 2

# Image Settings
ImageURL: http://www.economist.com/images/dingbats/e5.gif
ImageURL: http://www.economist.com/images/\d+/.*
UseAltTagForURL: 0

# Content Settings
ContentsStart: <td colspan="7" width="447" valign="top">
ContentsEnd: <a href="/diversions/quiz/">
ContentsUseTableSmarts: 0

# Story Settings
StoryToPrintableSub: s!displayStory.cfm!PrinterFriendly.cfm!
StoryURL: http://www.economist.com/(.*?)/PrinterFriendly.cfm(.*?)

# PreProcess Settings
ContentsHTMLPreProcess: {
# remove ads...hope that's not killing it when layout changes
s,<div align="center">[^<]<a href="/printedition/">.*<td width="209" valign="top" height="1700">,</font>,gim;
# remove the 'More from...' Links
s,<div align="right"><b><a href="[^"]*"><font[^>]*>[^/]*</font></a></b></div><br>,,gim;
# remove the 'More reviews...' Links
s,<div align="right"><font[^>]*><b><a href="[^"]*"><font color="[^"]*">More reviews</font></a></b></div>,,gim;
# gfx -> txt headers
s,<a href="[^"]*"><img src="/images/sections/(\w+)\.gif"[^>]*></a><br><br>,<hr>Section: $1<br>,gim;
# gfx -> txt header "markets2
s,<p><a href="[^"]*"><img alt="MARKETS" border="0" src="/images/sections/m-d\.gif" width="207" height="19"></a></p>,<hr>Section: markets<br>,gim;
# remove links to pay-content
s,<a href="[^"]*">([^<]*)</a></b>\s<img alt="E\+" width="17" height="10" border="0" src="/images/dingbats/e5\.gif">,$1<font size=1>(pay-content)</font></b><img width="17" height="10" border="0" src="/images/dingbats/e5\.gif">,gim;
# remove links to pay-content
s,<a href="[^"]*">([^<]*)</a></b>\s<img src="/images/dingbats/e5\.gif" alt="" />,$1<font size=1>(pay-content)</font></b><img width="17" height="10" border="0" src="/images/dingbats/e5\.gif">,gim;
# remove Also-on-the-site... column
s,<img alt="also on the site ...".*</td></tr></table><br>,,gim;
}

StoryHTMLPreProcess: {
# remove 'get article background...'
s,<p>.*<a target="background"[^>]*">.*background</b></font></a></font></p><!--back-->,,gim;
s/align="right"//gim;
s/align="center"//gim;
s/align=right//gim;
s/align=center//gim;
}


Greets

Alex