Shiny New E-Book Gizmo: The Amazon Kindle


View Full Version : NewsRaider scrapes sites for news


Alexander Turcic
07-24-2005, 06:24 AM
PalmAddicts has a post (http://palmaddict.typepad.com/palmaddicts/2005/07/news_raider.html) this morning on a new application called NewsRaider (http://www.newsraider.com/index.php). The Proporta/Tomeraider guys advertise it as:

A revolutionary application for Windows that allows any News, Reviews or Magazine site to have its articles "raided" and converted into a single rolling news service. NewsRaider is free. We guarantee that no other product or service can give you so much news in such a distilled format. NewsRaider is fast. The articles are downloaded before you want to read them (It sits in the background taking up little system resources but huge amounts of news resources). NewsRaider does not use RSS or other syndication. It goes straight to the source. This means that you get the news content you want, whenever you want.

Really sorry, but I don't see the benefit in scraping (http://en.wikipedia.org/wiki/Screen_scraping) news over using official RSS feeds. NewsRaider has scraping scripts for CNN, BBC News, and Guardian, even though each one of them already offers feeds for various sections of their site. And if you really have to scrape content, wouldn't it be better to learn how to do it with Regular Expressions + Sitescooper (http://www.mobileread.com/forums/search.php?query=Sitescooper&do=process) instead of studying an incompatible "Raid Script" language?

csmith75
07-24-2005, 08:16 AM
I've had Newsraider on my laptop for a number of months now and it's pretty easy to use since it just downloads news in the background. I used to use it with Tomeraider on my Axim x50 and take the news with me in the morning since it was quicker than downloading all my feeds via RSS (plus you get the full content). I stopped using it though because it just mysteriously stopped working with Tomeraider.

hacker
07-24-2005, 08:42 AM
Anyone happen to know the UserAgent string or netblock they are using, so I can be sure to block them? Thanks.

Laurens
07-24-2005, 11:27 AM
The user agent is "NewsRaider". I wouldn't worry about this app hijacking too much bandwidth. The interface is spectacularly ugly and unintuitive, I don't think many people will want to use this app to begin with. Its only redeeming value is that the pages do come out sort of right on the PPC client. Admittedly, it does do what it says.

hacker
07-24-2005, 11:51 AM
I was under the impression that it just pounded the hell out of your pages trying to find newsworthy sections from some basic word-analysis.

I've got enough bandwidth to throw around, but I don't if 500,000 people all hit the same domains with mal-behaved clients that ignorantly hammer pages looking for something that resembles "news".

MatYadabyte
08-12-2005, 03:38 AM
Hi

NewsRaider is actually very bandwidth efficient. It doesn’t download the content that is not relevant to the article.

I agree that its is one ugly interface but we are going to start working on that pretty soon First we are going to do the Palm version to compliment the PPC and PC versions. If anyone would like to beta test this we would be delighted.

Thanks very much,

And any questions, comments or criticisms you have, just ask

Mat

MatYadabyte
08-12-2005, 03:50 AM
Hi Alexander

NewsRaider over RSS advantages:


• NewsRaider downloads articles before you read them, unlike with RSS.
• NewsRaider downloads articles from sources that do not provide full or even any syndication.
• NewsRaider brings together thousands of articles a day that can be browsed, searched and filtered faster than any other method available. Am I worng here?
• NewsRaider removes all unwanted content from sites, including popups, scripts, banners and adverts.
• NewsRaider provides a very unique reading experience. There is no other way to get so much news content so efficiently.

Thanks ,

Mat

Any questions, just ask:)