Zap2it URLs Not Parsing...Possible with Plucker Desktop However

g-funkster · 03-03-2005, 08:47 PM

If anyone can help me fix this, I would LOVE it.

The URLS i'm interested match this regular expression in plucker desktop, but sunrise they get ignored. In fact, when I tell it to go depth 1 without any url filtering, it still does not see them... while Plucker Desktop Does. Sunrise works for all my other websites, but Zap2it just won't work. The links are unusual, but still, they are like this:

<LI><a class=headlines href='/tveditorial/tve_main/1,1002,271|93918|1|,00.html'>'Trial by Jury' Fiddles with 'Law & Order' Blueprint</a></LI>

<LI><a class=headlines href='/tveditorial/tve_main/1,1002,271|93914|1|,00.html'>FOX Is February Sweeps Idol</a></LI>

....

Please help me find a way to simply get these links. This sunrise is just amazing....Please help
Thanks!

Colin Dunstan · 03-05-2005, 03:03 AM

Perhaps a parsing bug in Sunrise's regex engine? I remember Laurens mentioning a small bug in the regex, not sure if this is related to your problem. Or perhaps Sunrise doesn't like the | character in URIs.

g-funkster · 03-05-2005, 12:12 PM

Herm,

I think I rather like the theory of the | character being the problem, but that sounds retarded since the HTML spec should have nothing to do with it for the engine to have a problem with it.

Neither is the comma, which I thought was the reason for the problem.

Or maybe it's that the href data is in single quotes and not quotation marks?

Or maybe the engine is looking for an outright "<a\s*href=", but in this case instead of just spaces there's a class tag statement in there btwn the a and the href?

i've thrown up some suggestions there. I'm actually writing this post on my Treo 650, that's how much I love this thing. I'd do anything to make my experience with this thing better, and that includes throwing plucker desktop out the door.

Thanks for listening guys, and keep the suggestions coming and perhaps we can get to the bottom of this.

Laurens · 03-05-2005, 01:07 PM

Well, it's actually the pipe symbol '|' that is causing problems. The URI class responsible for resolving links insists on this character being escaped using '%7c'. This will be fixed in the next revision.

g-funkster · 03-05-2005, 01:23 PM

!

That's great. Thanks for coming by and reading. Hopefully I've helped somewhat in your quest towards commercializing this product. I saw so much potential in it I actually uninstalled avantgo a few days ago.

Thanks again for all your work, I really appreciate it.

lescarleton · 08-04-2005, 12:19 AM

Hi,

I'm getting the same problem with Computerworld (where you need the article number from the middle of a comma separated URL to translate to the print page).

document.onanchorlink = function(link) {
if (link.depth == 1) {
// var artid = link.uri.match(".*\/.*\,(.*)\,.*")[1] <<my first try
var artid = link.uri.match("(.*)\/(.*)$")[2]
artid = artid.match("(.*)\,(.*)")[1]
link.uri = "http://www.computerworld.com/printthis/2005/0,4814," + artid + ",00.html"
}
};

This is trying to map
http://www.computerworld.com/hardwar...XXXXXX,00.html
to
http://www.computerworld.com/printth...XXXXXX,00.html

I've tried both comma and %2c and \x2c to represent commas, but to no avail. I get the error "Cannot read property "1" from null".

Strangely enough this works on some Computerworld URLs in Windows but not on others, but on Linux it fails on all Computerworld URLs

Cheers!

...Les...

Laurens · 08-04-2005, 04:06 PM

See attached ZIP for SDL and script file. I only tried it out with the RSS feed.

Their RSS feeds are here.

lescarleton · 08-10-2005, 07:57 PM

Thanks Laurens!

I've made one more change to the code to make is exclude links to indexes instead of stories. What I have is this and it seems to work!

document.name += " " + formatDate(wednesday, "yyyy-MM-dd");

var storyPattern = /0,\d+,\d+,00\.html/;

document.onanchorlink = function(link) {
var matches = link.uri.match(storyPattern);
if ((matches != null) && (link.uri.match(/story/)) ) {
var story = matches[0];
story = story.replace(",10801,", ",4814,");
link.referrer = link.uri;
link.uri = "http://www.computerworld.com/printthis/2005/" + story;
}
};

Cheers!

...Les...

03-03-2005, 08:47 PM	#1
g-funkster Junior Member Posts: 5 Karma: 10 Join Date: Mar 2005	Zap2it URLs Not Parsing...Possible with Plucker Desktop However If anyone can help me fix this, I would LOVE it. The URLS i'm interested match this regular expression in plucker desktop, but sunrise they get ignored. In fact, when I tell it to go depth 1 without any url filtering, it still does not see them... while Plucker Desktop Does. Sunrise works for all my other websites, but Zap2it just won't work. The links are unusual, but still, they are like this: <LI><a class=headlines href='/tveditorial/tve_main/1,1002,271\|93918\|1\|,00.html'>'Trial by Jury' Fiddles with 'Law & Order' Blueprint</a></LI> <LI><a class=headlines href='/tveditorial/tve_main/1,1002,271\|93914\|1\|,00.html'>FOX Is February Sweeps Idol</a></LI> .... Please help me find a way to simply get these links. This sunrise is just amazing....Please help Thanks!

03-05-2005, 12:12 PM	#3
g-funkster Junior Member Posts: 5 Karma: 10 Join Date: Mar 2005	I suppose.. Herm, I think I rather like the theory of the \| character being the problem, but that sounds retarded since the HTML spec should have nothing to do with it for the engine to have a problem with it. Neither is the comma, which I thought was the reason for the problem. Or maybe it's that the href data is in single quotes and not quotation marks? Or maybe the engine is looking for an outright "<a\s*href=", but in this case instead of just spaces there's a class tag statement in there btwn the a and the href? i've thrown up some suggestions there. I'm actually writing this post on my Treo 650, that's how much I love this thing. I'd do anything to make my experience with this thing better, and that includes throwing plucker desktop out the door. Thanks for listening guys, and keep the suggestions coming and perhaps we can get to the bottom of this.

08-04-2005, 12:19 AM	#6
lescarleton Junior Member Posts: 3 Karma: 10 Join Date: Mar 2004 Device: Clie NX90/Psion/A920	Similar problem with computerworld Hi, I'm getting the same problem with Computerworld (where you need the article number from the middle of a comma separated URL to translate to the print page). document.onanchorlink = function(link) { if (link.depth == 1) { // var artid = link.uri.match(".\/.\,(.)\,.")[1] <<my first try var artid = link.uri.match("(.)\/(.)$")[2] artid = artid.match("(.)\,(.)")[1] link.uri = "http://www.computerworld.com/printthis/2005/0,4814," + artid + ",00.html" } }; This is trying to map http://www.computerworld.com/hardwar...XXXXXX,00.html to http://www.computerworld.com/printth...XXXXXX,00.html I've tried both comma and %2c and \x2c to represent commas, but to no avail. I get the error "Cannot read property "1" from null". Strangely enough this works on some Computerworld URLs in Windows but not on others, but on Linux it fails on all Computerworld URLs Cheers! ...Les...

08-10-2005, 07:57 PM	#8
lescarleton Junior Member Posts: 3 Karma: 10 Join Date: Mar 2004 Device: Clie NX90/Psion/A920	Working now Thanks Laurens! I've made one more change to the code to make is exclude links to indexes instead of stories. What I have is this and it seems to work! document.name += " " + formatDate(wednesday, "yyyy-MM-dd"); var storyPattern = /0,\d+,\d+,00\.html/; document.onanchorlink = function(link) { var matches = link.uri.match(storyPattern); if ((matches != null) && (link.uri.match(/story/)) ) { var story = matches[0]; story = story.replace(",10801,", ",4814,"); link.referrer = link.uri; link.uri = "http://www.computerworld.com/printthis/2005/" + story; } }; Cheers! ...Les...

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
can't get plucker desktop to work with rss feeds	darchon	Reading and Management	4	01-22-2006 12:34 PM
Plucker desktop questions	macsek	Reading and Management	5	07-18-2005 04:25 AM
how to try JPluck & Plucker Desktop?	jeffcarp	Reading and Management	3	11-10-2003 06:15 AM
How to use the jxl in plucker desktop	confusedvorlon	Reading and Management	1	08-21-2003 12:23 PM
Plucker Desktop & JpluckX	multisyn	Reading and Management	5	06-03-2003 02:39 PM

03-05-2005, 03:03 AM	#2
Colin Dunstan Is papyrophobic! Posts: 1,926 Karma: 1009999 Join Date: Aug 2003 Location: USA Device: Dell Axim	Perhaps a parsing bug in Sunrise's regex engine? I remember Laurens mentioning a small bug in the regex, not sure if this is related to your problem. Or perhaps Sunrise doesn't like the \| character in URIs.

03-05-2005, 01:07 PM	#4
Laurens Jah Blessed Posts: 1,295 Karma: 1373 Join Date: Apr 2003 Location: The Netherlands Device: iPod Touch	Well, it's actually the pipe symbol '\|' that is causing problems. The URI class responsible for resolving links insists on this character being escaped using '%7c'. This will be fixed in the next revision.

03-05-2005, 01:23 PM	#5
g-funkster Junior Member Posts: 5 Karma: 10 Join Date: Mar 2005	! That's great. Thanks for coming by and reading. Hopefully I've helped somewhat in your quest towards commercializing this product. I saw so much potential in it I actually uninstalled avantgo a few days ago. Thanks again for all your work, I really appreciate it.