![]() |
#1 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Mar 2005
|
Zap2it URLs Not Parsing...Possible with Plucker Desktop However
If anyone can help me fix this, I would LOVE it.
The URLS i'm interested match this regular expression in plucker desktop, but sunrise they get ignored. In fact, when I tell it to go depth 1 without any url filtering, it still does not see them... while Plucker Desktop Does. Sunrise works for all my other websites, but Zap2it just won't work. The links are unusual, but still, they are like this: <LI><a class=headlines href='/tveditorial/tve_main/1,1002,271|93918|1|,00.html'>'Trial by Jury' Fiddles with 'Law & Order' Blueprint</a></LI> <LI><a class=headlines href='/tveditorial/tve_main/1,1002,271|93914|1|,00.html'>FOX Is February Sweeps Idol</a></LI> .... Please help me find a way to simply get these links. This sunrise is just amazing....Please help Thanks! |
![]() |
![]() |
#2 |
Is papyrophobic!
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,926
Karma: 1009999
Join Date: Aug 2003
Location: USA
Device: Dell Axim
|
Perhaps a parsing bug in Sunrise's regex engine? I remember Laurens mentioning a small bug in the regex, not sure if this is related to your problem. Or perhaps Sunrise doesn't like the | character in URIs.
|
![]() |
![]() |
#3 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Mar 2005
|
I suppose..
Herm,
I think I rather like the theory of the | character being the problem, but that sounds retarded since the HTML spec should have nothing to do with it for the engine to have a problem with it. Neither is the comma, which I thought was the reason for the problem. Or maybe it's that the href data is in single quotes and not quotation marks? Or maybe the engine is looking for an outright "<a\s*href=", but in this case instead of just spaces there's a class tag statement in there btwn the a and the href? i've thrown up some suggestions there. I'm actually writing this post on my Treo 650, that's how much I love this thing. I'd do anything to make my experience with this thing better, and that includes throwing plucker desktop out the door. Thanks for listening guys, and keep the suggestions coming and perhaps we can get to the bottom of this. |
![]() |
![]() |
#4 |
Jah Blessed
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,295
Karma: 1373
Join Date: Apr 2003
Location: The Netherlands
Device: iPod Touch
|
Well, it's actually the pipe symbol '|' that is causing problems. The URI class responsible for resolving links insists on this character being escaped using '%7c'. This will be fixed in the next revision.
|
![]() |
![]() |
#5 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Mar 2005
|
!
That's great. Thanks for coming by and reading. Hopefully I've helped somewhat in your quest towards commercializing this product. I saw so much potential in it I actually uninstalled avantgo a few days ago. Thanks again for all your work, I really appreciate it. |
![]() |
![]() |
#6 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Mar 2004
Device: Clie NX90/Psion/A920
|
Similar problem with computerworld
Hi,
I'm getting the same problem with Computerworld (where you need the article number from the middle of a comma separated URL to translate to the print page). document.onanchorlink = function(link) { if (link.depth == 1) { // var artid = link.uri.match(".*\/.*\,(.*)\,.*")[1] <<my first try var artid = link.uri.match("(.*)\/(.*)$")[2] artid = artid.match("(.*)\,(.*)")[1] link.uri = "http://www.computerworld.com/printthis/2005/0,4814," + artid + ",00.html" } }; This is trying to map http://www.computerworld.com/hardwar...XXXXXX,00.html to http://www.computerworld.com/printth...XXXXXX,00.html I've tried both comma and %2c and \x2c to represent commas, but to no avail. I get the error "Cannot read property "1" from null". Strangely enough this works on some Computerworld URLs in Windows but not on others, but on Linux it fails on all Computerworld URLs Cheers! ...Les... |
![]() |
![]() |
#7 |
Jah Blessed
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,295
Karma: 1373
Join Date: Apr 2003
Location: The Netherlands
Device: iPod Touch
|
Computerworld SDL
See attached ZIP for SDL and script file. I only tried it out with the RSS feed.
Their RSS feeds are here. Last edited by Laurens; 08-04-2005 at 04:09 PM. Reason: Added link to RSS feeds |
![]() |
![]() |
#8 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Mar 2004
Device: Clie NX90/Psion/A920
|
![]()
Thanks Laurens!
I've made one more change to the code to make is exclude links to indexes instead of stories. What I have is this and it seems to work! document.name += " " + formatDate(wednesday, "yyyy-MM-dd"); var storyPattern = /0,\d+,\d+,00\.html/; document.onanchorlink = function(link) { var matches = link.uri.match(storyPattern); if ((matches != null) && (link.uri.match(/story/)) ) { var story = matches[0]; story = story.replace(",10801,", ",4814,"); link.referrer = link.uri; link.uri = "http://www.computerworld.com/printthis/2005/" + story; } }; Cheers! ...Les... |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
can't get plucker desktop to work with rss feeds | darchon | Reading and Management | 4 | 01-22-2006 12:34 PM |
Plucker desktop questions | macsek | Reading and Management | 5 | 07-18-2005 04:25 AM |
how to try JPluck & Plucker Desktop? | jeffcarp | Reading and Management | 3 | 11-10-2003 06:15 AM |
How to use the jxl in plucker desktop | confusedvorlon | Reading and Management | 1 | 08-21-2003 12:23 PM |
Plucker Desktop & JpluckX | multisyn | Reading and Management | 5 | 06-03-2003 02:39 PM |