Shiny New E-Book Gizmo: The Amazon Kindle


View Full Version : How to rewrite New Scientist links


pingtiao
02-05-2007, 03:47 AM
I am trying to rewrite the links for the NewScientist page, but am having no luck. They very cleverly run their RSS feeds through another company so it is impossible (?) to remove the ads, so I am trying to do it from the front page.

I am attempting to
1. Include only articles and no ads
2. rewrite the article urls to include only the printer friendly versions

OK, so here is what the links look like
http://www.newscientist.com/article/dn11099-vaccine-zaps-allergy-in-record-time.html
to
http://www.newscientist.com/article.ns?id=dn11099&print=true

and others like
http://environment.newscientist.com/article/dn11096-key-climate-report-sparks-global-call-to-action.html
to
http://environment.newscientist.com/article.ns?id=dn11096&print=true


I have so far written as my filters to try and take the first section up to the ".newscientist", then the article id up to the first hyphen (so "dn11096" above), and then append the "&print=true" to get the final printer version.

Here are my trial filters (that don't do the job!)

http://(.*)\.newscientist\.com/article\.ns?id=(.*)&(.*)
changing to
http://$1\.newscientist\.com/article\.ns?id=$2\&print=true

Any help would be greatly appreciated!

DTM
02-05-2007, 01:39 PM
The following seems to work. I tried it on their news page with success.

Filter:
(.*)newscientist\.com/article/dn(.{5,5}).*

Rewrite as:
$1newscientist.com/article.ns?id=dn$2&print=true


The expression (.{5,5}) captures any string of exactly five characters. (Literally, any string of length between 5 and 5.)

pingtiao
02-05-2007, 02:40 PM
Excellent- works perfectly!

Thanks DTM :cool:

DTM
02-05-2007, 04:50 PM
You're welcome!

I noticed that some of the links still don't get rewritten because they include newscientisttech in the URL. I think the following should catch that (but have not tried it).

Filter:
(.*)newscientist(.*)\.com/article/dn(.{5,5}).*

Rewrite as:
$1newscientist$2.com/article.ns?id=dn$3&print=true

(Note that the old $2 is now $3.)