Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > Miscellaneous > Archive > Sitescooper

Notices

 
 
Thread Tools Search this Thread
Old 03-23-2004, 02:35 PM   #1
Colin Dunstan
Is papyrophobic!
Colin Dunstan ought to be getting tired of karma fortunes by now.Colin Dunstan ought to be getting tired of karma fortunes by now.Colin Dunstan ought to be getting tired of karma fortunes by now.Colin Dunstan ought to be getting tired of karma fortunes by now.Colin Dunstan ought to be getting tired of karma fortunes by now.Colin Dunstan ought to be getting tired of karma fortunes by now.Colin Dunstan ought to be getting tired of karma fortunes by now.Colin Dunstan ought to be getting tired of karma fortunes by now.Colin Dunstan ought to be getting tired of karma fortunes by now.Colin Dunstan ought to be getting tired of karma fortunes by now.Colin Dunstan ought to be getting tired of karma fortunes by now.
 
Colin Dunstan's Avatar
 
Posts: 1,926
Karma: 1009999
Join Date: Aug 2003
Location: USA
Device: Dell Axim
The Economist Scoop

URL: http://www.economist.com/
Name: Economist
Description: Economist
AuthorName: Goh Boon Nam
# Version 1.0
# Date updated : 9 Jan 2004
Attached Files
File Type: site economist.site (817 Bytes, 490 views)
Colin Dunstan is offline  
Old 03-23-2004, 02:47 PM   #2
ignatz
mechanoholic
ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.
 
ignatz's Avatar
 
Posts: 582
Karma: 1000217
Join Date: Mar 2004
Location: Sarasota, FL
Device: Nook STR/iPhone 4S/EVO 4G
This looks great! Economist.com seems to be having some problems just now, but I'm looking forward to adding this to my sites list. Thanks Morpheus.
ignatz is offline  
Old 03-24-2004, 01:16 AM   #3
ignatz
mechanoholic
ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.
 
ignatz's Avatar
 
Posts: 582
Karma: 1000217
Join Date: Mar 2004
Location: Sarasota, FL
Device: Nook STR/iPhone 4S/EVO 4G
I've been testing this and I seem to get shut down. I get an HTTP GET error with the message "Automatic downloading forbidden". Now that's just not neighborly. Anyone solved this?
ignatz is offline  
Old 03-24-2004, 04:59 AM   #4
Alexander Turcic
Fully Converged
Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.
 
Alexander Turcic's Avatar
 
Posts: 17,455
Karma: 10995944
Join Date: Oct 2002
Location: Switzerland
Device: Sony PRS-650 / Nexus 7 / Kindle PW
Hm I've been studying Sitescooper the past few days (on your advise, ignatz!), but I don't see any option to slowdown the html download process. I think Economist has some kind of mod_bandwidth module running.
Alexander Turcic is offline  
Old 03-28-2004, 07:49 AM   #5
stobs
Connoisseur
stobs is on a distinguished road
 
Posts: 62
Karma: 72
Join Date: Oct 2002
Location: Germany
Device: nook
Is it possible to set up the user-agent ID to something like:
Mozilla/3.0 (compatible; AvantGo 5.2; FreeBSD)
perhaps that is the problem?

-S.

Quote:
Originally Posted by Alexander
Hm I've been studying Sitescooper the past few days (on your advise, ignatz!), but I don't see any option to slowdown the html download process. I think Economist has some kind of mod_bandwidth module running.
stobs is offline  
Old 03-31-2004, 02:33 PM   #6
Alexander Turcic
Fully Converged
Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.
 
Alexander Turcic's Avatar
 
Posts: 17,455
Karma: 10995944
Join Date: Oct 2002
Location: Switzerland
Device: Sony PRS-650 / Nexus 7 / Kindle PW
Quote:
Originally Posted by stobs
Is it possible to set up the user-agent ID to something like:
Mozilla/3.0 (compatible; AvantGo 5.2; FreeBSD)
perhaps that is the problem?

-S.
stobs, it is possible to change the user-agent when you edit a sitescooper file. You can find more info here.
Alexander Turcic is offline  
Old 04-03-2004, 04:42 AM   #7
Alexander Turcic
Fully Converged
Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.
 
Alexander Turcic's Avatar
 
Posts: 17,455
Karma: 10995944
Join Date: Oct 2002
Location: Switzerland
Device: Sony PRS-650 / Nexus 7 / Kindle PW
I've improved the original economics scoop somewhat to exclude unwanted content.

Code:
URL: http://www.economist.com/
Name: Economist
Description: Economist
AuthorName: Goh Boon Nam

# General Settings
Active: 1
SizeLimit: 2000
Levels: 2

# Image Settings
ImageURL: http://www.economist.com/images/dingbats/e5.gif
ImageURL: http://www.economist.com/images/\d+/.*
UseAltTagForURL: 0

# Content Settings
ContentsStart: <td colspan="7" width="447" valign="top">
ContentsEnd: <a href="/diversions/quiz/">
ContentsUseTableSmarts: 0 

# Story Settings
StoryToPrintableSub: s!displayStory.cfm!PrinterFriendly.cfm!
StoryURL: http://www.economist.com/(.*?)/PrinterFriendly.cfm(.*?)

# PreProcess Settings
ContentsHTMLPreProcess: {
	# remove ads...hope that's not killing it when layout changes
	s,<div align="center">[^<]<a href="/printedition/">.*<td width="209" valign="top" height="1700">,</font>,gim;
	# remove the 'More from...' Links
	s,<div align="right"><b><a href="[^"]*"><font[^>]*>[^/]*</font></a></b></div><br>,,gim;
	# remove the 'More reviews...' Links
	s,<div align="right"><font[^>]*><b><a href="[^"]*"><font color="[^"]*">More reviews</font></a></b></div>,,gim;
	# gfx -> txt headers
	s,<a href="[^"]*"><img src="/images/sections/(\w+)\.gif"[^>]*></a><br><br>,<hr>Section: $1<br>,gim;
	# gfx -> txt header "markets2
	s,<p><a href="[^"]*"><img alt="MARKETS" border="0" src="/images/sections/m-d\.gif" width="207" height="19"></a></p>,<hr>Section: markets<br>,gim;
	# remove links to pay-content
	s,<a href="[^"]*">([^<]*)</a></b>\s<img alt="E\+" width="17" height="10" border="0" src="/images/dingbats/e5\.gif">,$1<font size=1>(pay-content)</font></b><img width="17" height="10" border="0" src="/images/dingbats/e5\.gif">,gim;
	# remove links to pay-content
	s,<a href="[^"]*">([^<]*)</a></b>\s<img src="/images/dingbats/e5\.gif" alt="" />,$1<font size=1>(pay-content)</font></b><img width="17" height="10" border="0" src="/images/dingbats/e5\.gif">,gim;
	# remove Also-on-the-site... column
	s,<img alt="also on the site ...".*</td></tr></table><br>,,gim;
}

StoryHTMLPreProcess: {
	# remove 'get article background...'
	s,<p>.*<a target="background"[^>]*">.*background</b></font></a></font></p><!--back-->,,gim;
	s/align="right"//gim;
	s/align="center"//gim;
	s/align=right//gim;
	s/align=center//gim;
}
Greets

Alex
Attached Files
File Type: site economist.site (2.2 KB, 451 views)
Alexander Turcic is offline  
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Sony Reader Daily Edition - SCOOP! Nate the great News 303 11-03-2009 04:57 PM
eReader SCOOP!!! on TeleRead Robotech_Master News 29 12-10-2008 09:19 AM
E-books on cellphones: what's the scoop? mreames Alternative Devices 3 01-08-2007 04:23 AM


All times are GMT -4. The time now is 05:05 PM.


MobileRead.com is a privately owned, operated and funded community.