|
|
View Full Version : Zap2it URLs Not Parsing...Possible with Plucker Desktop However
g-funkster 03-03-2005, 07:47 PM If anyone can help me fix this, I would LOVE it.
The URLS i'm interested match this regular expression in plucker desktop, but sunrise they get ignored. In fact, when I tell it to go depth 1 without any url filtering, it still does not see them... while Plucker Desktop Does. Sunrise works for all my other websites, but Zap2it just won't work. The links are unusual, but still, they are like this:
<LI><a class=headlines href='/tveditorial/tve_main/1,1002,271|93918|1|,00.html'>'Trial by Jury' Fiddles with 'Law & Order' Blueprint</a></LI>
<LI><a class=headlines href='/tveditorial/tve_main/1,1002,271|93914|1|,00.html'>FOX Is February Sweeps Idol</a></LI>
....
Please help me find a way to simply get these links. This sunrise is just amazing....Please help
Thanks!
Colin Dunstan 03-05-2005, 02:03 AM Perhaps a parsing bug in Sunrise's regex engine? I remember Laurens mentioning a small bug in the regex, not sure if this is related to your problem. Or perhaps Sunrise doesn't like the | character in URIs.
g-funkster 03-05-2005, 11:12 AM Herm,
I think I rather like the theory of the | character being the problem, but that sounds retarded since the HTML spec should have nothing to do with it for the engine to have a problem with it.
Neither is the comma, which I thought was the reason for the problem.
Or maybe it's that the href data is in single quotes and not quotation marks?
Or maybe the engine is looking for an outright "<a\s*href=", but in this case instead of just spaces there's a class tag statement in there btwn the a and the href?
i've thrown up some suggestions there. I'm actually writing this post on my Treo 650, that's how much I love this thing. I'd do anything to make my experience with this thing better, and that includes throwing plucker desktop out the door.
Thanks for listening guys, and keep the suggestions coming and perhaps we can get to the bottom of this.
Laurens 03-05-2005, 12:07 PM Well, it's actually the pipe symbol '|' that is causing problems. The URI class responsible for resolving links insists on this character being escaped using '%7c'. This will be fixed in the next revision.
g-funkster 03-05-2005, 12:23 PM !
That's great. Thanks for coming by and reading. Hopefully I've helped somewhat in your quest towards commercializing this product. I saw so much potential in it I actually uninstalled avantgo a few days ago.
Thanks again for all your work, I really appreciate it.
lescarleton 08-03-2005, 11:19 PM Hi,
I'm getting the same problem with Computerworld (where you need the article number from the middle of a comma separated URL to translate to the print page).
document.onanchorlink = function(link) {
if (link.depth == 1) {
// var artid = link.uri.match(".*\/.*\,(.*)\,.*")[1] <<my first try
var artid = link.uri.match("(.*)\/(.*)$")[2]
artid = artid.match("(.*)\,(.*)")[1]
link.uri = "http://www.computerworld.com/printthis/2005/0,4814," + artid + ",00.html"
}
};
This is trying to map
http://www.computerworld.com/hardwaretopics/storage/story/0,10801,XXXXXX,00.html
to
http://www.computerworld.com/printthis/2005/0,4814,XXXXXX,00.html
I've tried both comma and %2c and \x2c to represent commas, but to no avail. I get the error "Cannot read property "1" from null".
Strangely enough this works on some Computerworld URLs in Windows but not on others, but on Linux it fails on all Computerworld URLs
Cheers!
...Les...
Laurens 08-04-2005, 03:06 PM See attached ZIP for SDL and script file. I only tried it out with the RSS feed.
Their RSS feeds are here (http://computerworld.com/news/xml/index/).
lescarleton 08-10-2005, 06:57 PM Thanks Laurens!
I've made one more change to the code to make is exclude links to indexes instead of stories. What I have is this and it seems to work!
document.name += " " + formatDate(wednesday, "yyyy-MM-dd");
var storyPattern = /0,\d+,\d+,00\.html/;
document.onanchorlink = function(link) {
var matches = link.uri.match(storyPattern);
if ((matches != null) && (link.uri.match(/story/)) ) {
var story = matches[0];
story = story.replace(",10801,", ",4814,");
link.referrer = link.uri;
link.uri = "http://www.computerworld.com/printthis/2005/" + story;
}
};
Cheers!
...Les...
|