|
|
#1 |
|
Is papyrophobic!
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,926
Karma: 1009999
Join Date: Aug 2003
Location: USA
Device: Dell Axim
|
The Economist Scoop
URL: http://www.economist.com/
Name: Economist Description: Economist AuthorName: Goh Boon Nam # Version 1.0 # Date updated : 9 Jan 2004 |
|
|
|
|
#2 |
|
mechanoholic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 582
Karma: 1000217
Join Date: Mar 2004
Location: Sarasota, FL
Device: Nook STR/iPhone 4S/EVO 4G
|
This looks great! Economist.com seems to be having some problems just now, but I'm looking forward to adding this to my sites list. Thanks Morpheus.
|
|
|
| Advert | |
|
|
|
|
#3 |
|
mechanoholic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 582
Karma: 1000217
Join Date: Mar 2004
Location: Sarasota, FL
Device: Nook STR/iPhone 4S/EVO 4G
|
I've been testing this and I seem to get shut down. I get an HTTP GET error with the message "Automatic downloading forbidden". Now that's just not neighborly. Anyone solved this?
|
|
|
|
|
#4 |
|
Fully Converged
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 18,175
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
|
Hm I've been studying Sitescooper the past few days (on your advise, ignatz!), but I don't see any option to slowdown the html download process. I think Economist has some kind of mod_bandwidth module running.
|
|
|
|
|
#5 | |
|
Connoisseur
![]() Posts: 62
Karma: 72
Join Date: Oct 2002
Location: Germany
Device: nook
|
Is it possible to set up the user-agent ID to something like:
Mozilla/3.0 (compatible; AvantGo 5.2; FreeBSD) perhaps that is the problem? -S. Quote:
|
|
|
|
| Advert | |
|
|
|
|
#6 | |
|
Fully Converged
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 18,175
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
|
Quote:
|
|
|
|
|
|
#7 |
|
Fully Converged
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 18,175
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
|
I've improved the original economics scoop somewhat to exclude unwanted content.
Code:
URL: http://www.economist.com/
Name: Economist
Description: Economist
AuthorName: Goh Boon Nam
# General Settings
Active: 1
SizeLimit: 2000
Levels: 2
# Image Settings
ImageURL: http://www.economist.com/images/dingbats/e5.gif
ImageURL: http://www.economist.com/images/\d+/.*
UseAltTagForURL: 0
# Content Settings
ContentsStart: <td colspan="7" width="447" valign="top">
ContentsEnd: <a href="/diversions/quiz/">
ContentsUseTableSmarts: 0
# Story Settings
StoryToPrintableSub: s!displayStory.cfm!PrinterFriendly.cfm!
StoryURL: http://www.economist.com/(.*?)/PrinterFriendly.cfm(.*?)
# PreProcess Settings
ContentsHTMLPreProcess: {
# remove ads...hope that's not killing it when layout changes
s,<div align="center">[^<]<a href="/printedition/">.*<td width="209" valign="top" height="1700">,</font>,gim;
# remove the 'More from...' Links
s,<div align="right"><b><a href="[^"]*"><font[^>]*>[^/]*</font></a></b></div><br>,,gim;
# remove the 'More reviews...' Links
s,<div align="right"><font[^>]*><b><a href="[^"]*"><font color="[^"]*">More reviews</font></a></b></div>,,gim;
# gfx -> txt headers
s,<a href="[^"]*"><img src="/images/sections/(\w+)\.gif"[^>]*></a><br><br>,<hr>Section: $1<br>,gim;
# gfx -> txt header "markets2
s,<p><a href="[^"]*"><img alt="MARKETS" border="0" src="/images/sections/m-d\.gif" width="207" height="19"></a></p>,<hr>Section: markets<br>,gim;
# remove links to pay-content
s,<a href="[^"]*">([^<]*)</a></b>\s<img alt="E\+" width="17" height="10" border="0" src="/images/dingbats/e5\.gif">,$1<font size=1>(pay-content)</font></b><img width="17" height="10" border="0" src="/images/dingbats/e5\.gif">,gim;
# remove links to pay-content
s,<a href="[^"]*">([^<]*)</a></b>\s<img src="/images/dingbats/e5\.gif" alt="" />,$1<font size=1>(pay-content)</font></b><img width="17" height="10" border="0" src="/images/dingbats/e5\.gif">,gim;
# remove Also-on-the-site... column
s,<img alt="also on the site ...".*</td></tr></table><br>,,gim;
}
StoryHTMLPreProcess: {
# remove 'get article background...'
s,<p>.*<a target="background"[^>]*">.*background</b></font></a></font></p><!--back-->,,gim;
s/align="right"//gim;
s/align="center"//gim;
s/align=right//gim;
s/align=center//gim;
}
Alex |
|
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Sony Reader Daily Edition - SCOOP! | Nate the great | News | 303 | 11-03-2009 04:57 PM |
| eReader SCOOP!!! on TeleRead | Robotech_Master | News | 29 | 12-10-2008 09:19 AM |
| E-books on cellphones: what's the scoop? | mreames | Alternative Devices | 3 | 01-08-2007 04:23 AM |