Shiny New E-Book Gizmo: The Amazon Kindle


View Full Version : RegExp quandary


csh
05-02-2005, 08:12 PM
Trying to filter a quick document composed of URLs for articles at news.com.com I wanted to follow up on. For example:

<A HREF="http://news.com.com/Quote+of+the+day+Hedge+fund+guy+laughing+butt+off/2110-1010_3-5690203.html?tag=st_lh">Article</A>

which I wanted to modify to:

<A HREF="http://news.com.com/2110-1010_3-5690203.html?tag=st.util.print">Article</A>

Here's what I came up with for the filter script:
//Script for converting news.com.com articles to print version with referrer
//Setup for checking for an RE
var regexp = /news\.com\.com/.*\+.*/

document.onanchorlink = function(link) {
// Set referrer to original uri
link.referrer = link.uri;
//If URL contains match to regexp
if ( link.uri.match(regexp) != null) {
//Then
// Rewrite uri to point to printable version
var junk = link.uri.match(regexp)[0];
link.uri = link.uri.replace(junk, "/news.com.com/");
link.uri = link.uri.replace("tag=st_lh", "tag-st.util.print");
//Endif
};
};

Problem is I end up with a status of:
Error: Exception reading script file "null": missing name after . operator (<embedded> #3)

What is wrong with my script? (Where did I go wrong?)

Any patient help would be greatly appreciated.

coolacid
05-03-2005, 11:47 AM
By no means am I good at this but try your regex as this:

var regexp = "/news\.com\.com/.*\\+.*/";

it went though fine, but i don't get the same links as you in my news so I can't test.

csh
05-07-2005, 08:32 AM
Tried that too. (and tried .*\\\+.*) It seems to be a problem in escaping the meta-value of "+".

BTW, noticed an error in the last "replace" line. This:
link.uri = link.uri.replace("tag=st_lh", "tag-st.util.print");
should be:
link.uri = link.uri.replace("tag=st_lh", "tag=st.util.print");

Laurens
05-07-2005, 08:45 AM
You need to escape the forward slash after ".com"

var regexp = /news\.com\.com\/.*\+.*/

csh
05-09-2005, 06:47 PM
Thanks, Laurens!

Didn't realize the leading and trailing '/' characters were bounding the RE. Now I see what was amiss. Explains why I was also ending up with the wrong number of '/' characters in the modified URL.