Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > Miscellaneous > Lounge

Notices

Reply
 
Thread Tools Search this Thread
Old 05-14-2005, 07:43 PM   #1
stobs
Connoisseur
stobs is on a distinguished road
 
Posts: 62
Karma: 72
Join Date: Oct 2002
Location: Germany
Device: nook
universal RSS-feed creator

Hi all,

I just found http://www.wotzwot.com/
It converts a site to a RSS feed - great!

My long missing site http://www.freewarepalm.com/moresoftware.shtml converts to:
wotzwot

-stobs.
stobs is offline   Reply With Quote
Old 05-16-2005, 06:01 AM   #2
Colin Dunstan
Is papyrophobic!
Colin Dunstan ought to be getting tired of karma fortunes by now.Colin Dunstan ought to be getting tired of karma fortunes by now.Colin Dunstan ought to be getting tired of karma fortunes by now.Colin Dunstan ought to be getting tired of karma fortunes by now.Colin Dunstan ought to be getting tired of karma fortunes by now.Colin Dunstan ought to be getting tired of karma fortunes by now.Colin Dunstan ought to be getting tired of karma fortunes by now.Colin Dunstan ought to be getting tired of karma fortunes by now.Colin Dunstan ought to be getting tired of karma fortunes by now.Colin Dunstan ought to be getting tired of karma fortunes by now.Colin Dunstan ought to be getting tired of karma fortunes by now.
 
Colin Dunstan's Avatar
 
Posts: 1,926
Karma: 1009999
Join Date: Aug 2003
Location: USA
Device: Dell Axim
So wotzwot is something like a universal page scraper? My biggest concern is how they handle caching issues. Are scraped sites being cached? Or are they already re-scraped when someone polls the feed? I am sure some webmasters would get seriously pissed if suddenly someone is constantly scraping his sites.
Colin Dunstan is offline   Reply With Quote
Advert
Old 05-17-2005, 03:54 AM   #3
stobs
Connoisseur
stobs is on a distinguished road
 
Posts: 62
Karma: 72
Join Date: Oct 2002
Location: Germany
Device: nook
I think there is no reason to cache the content, You define the rules and they fetch the content as you request them to do.

I like to invite to collect some sites here. My ones:

German BSI
http://www.wotzwot.com/rssxl.php?pag...=%3C%2Fspan%3E

German Plock Magazine (Golf-Sport)
http://www.wotzwot.com/rssxl.php?pag...ble%3E&sd=&ed=

-Stobs.

Quote:
Originally Posted by Morpheus
Are scraped sites being cached?
stobs is offline   Reply With Quote
Old 05-17-2005, 10:55 AM   #4
Alexander Turcic
Fully Converged
Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.
 
Alexander Turcic's Avatar
 
Posts: 18,163
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
But imagine someone requesting the feed every 5 mins. I had a few users who were polling the PalmGear feeds from our site every 5 seconds (!!!). Even worse, their feed client didn't support conditional gets, so it fetched the entire feed each time. Nor did it support gzipping. So every 5 seconds the user polled both PalmGear feeds which are each around 20k uncompressed. In other words, every minute he plugged almost half a Mb of bandwidth from us.

This is I think Colin's major fear with scraping sites like wotzwot. Do they re-scrape the target site every time a feed is polled (perhaps by several people even), or do you they localled cache the reformatted feeds. When I have some more time, I will give it a try and use it on some MobileRead resource, so that i can check our log files later to see how wotzwot handles this issue.
Alexander Turcic is offline   Reply With Quote
Old 05-18-2005, 09:39 AM   #5
stobs
Connoisseur
stobs is on a distinguished road
 
Posts: 62
Karma: 72
Join Date: Oct 2002
Location: Germany
Device: nook
At least for now they don't cache. I tried to fetch a google-news search with wotzwot.com:
google news

It provided me new links every some minutes.
That could be change of course.

-Stobs.
stobs is offline   Reply With Quote
Advert
Old 05-18-2005, 02:13 PM   #6
Alexander Turcic
Fully Converged
Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.
 
Alexander Turcic's Avatar
 
Posts: 18,163
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
Quote:
Originally Posted by stobs
At least for now they don't cache. I tried to fetch a google-news search with wotzwot.com:
google news

It provided me new links every some minutes.
That could be change of course.

-Stobs.
Yup, I think someone should contact the developer to consider adding caching.
Alexander Turcic is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
RSS Feed timezone Feedback 8 01-02-2010 06:55 PM
RSS Feed Prob... AKninja04 Calibre 6 08-25-2008 07:51 PM
RSS Feed Updates Alexander Turcic Announcements 0 06-11-2004 04:11 PM


All times are GMT -4. The time now is 07:47 AM.


MobileRead.com is a privately owned, operated and funded community.