Thread: Rss2Book
View Single Post
Old 03-29-2007, 03:57 AM   #148
adinb
RSS & Gadget Addict!
adinb is on a distinguished road
 
adinb's Avatar
 
Posts: 82
Karma: 67
Join Date: May 2005
Location: Albuquerque, NM
Device: Sony PRS-500, iPod Touch, iPhone
Now that I'm getting better with more complex .Net regex's, I can also articulate potential bug #2 a little more clearly to you:

-when the "apply extractor pattern to linked content" the Link Refomatter field is still using the groupings from the link element (i.e. guid, link, etc) and not the link extractor pattern.

I'll use "The Raw Story" as an example. It's a pretty basic RSS feed with the link element = 'link'. There's a printable version of each story, but you have to follow the link element and use the link extractor pattern on the followed link. (For this example I'll say that we grabbed 'http://rawstory.com/news/2007/Colbert_invites_Rom_Emanuel_on_show_0327.html')

On the followed link, I'll apply the regex "action='(http://rawstory.com/printstory.php\?story=\d+)'>" to snag the proper url for the printable version. With this regex, I should be able to make the link reformatter just {0} since I was able to pull the entire link. (yeah, I could optimize the regex, but I like 'em a little more readable, vice using backreferences, etc)

Looking in the log, the reformatted link ends up as "http://rawstory.com/news/2007/Colbert_invites_Rom_Emanuel_on_show_0327.html" instead of "http://rawstory.com/printstory.php?story=5513".

Doing a little more testing, if I move around the parens to make the regex "action='http://rawstory.com/printstory.php\?story=(\d+)'>" (which makes {0}=5513) and setting the the link reformatter field to "http://rawstory.com/printstory.php?story={0}" (which should again result in "http://rawstory.com/printstory.php?story=5513") results in the following reformatted link (copied from the log):
"http://rawstory.com/printstory.php?story=http://rawstory.com/news/2007/Colbert_invites_Rom_Emanuel_on_show_0327.html"

Which is why it initially looks like the extractor isn't being applied to linked content.

If there's just some sort of undocumented selector to force the link reformatter field to use the link extractor patter when following the link element, I'd ***love*** to see it.
adinb is offline   Reply With Quote