View Single Post
Old 09-21-2006, 01:05 PM   #4
PippoPippini began at the beginning.
Posts: 19
Karma: 29
Join Date: Sep 2006
Device: Palm TX
Originally Posted by DTM
I think you're out of luck on this one. They've gone to extreme lengths to make it impossible--as far as I can see--to identify the link to the printable version.

If you open the "printable" window, right-click and look at the page properties, you'll see that its URL is very long and very complex. It includes a phrase that is not part of the original page and also includes an eight-digit number that is not found anywhere in the source on the original page. If that information isn't there, then there is no way Sunrise is going to find it.

But your problem is even worse. You need to be able to construct the "printable" link not from the information in the main article page, but rather from just the information on the RSS page you're starting with. That means that the information that uniquely identifies the printable version must be in the link you start with. It's just not there. Sorry.

I think to have a similar problem with RSS feeds from Reuters.

Articles linked from RSS feed are divided in multiple pages. There is a link to a printable version, but it is in a pop-up, with a sintax that use a string of text used in the article's URL.

Analyzing Bloomberg RSS feeds, I think that probably it's possible to link easily the printable page, because the printable link has only a "#" at the end.

I also analyzed the feed of Washington Post.

In the RSS feed links are like this:

The printable one is:

The referring to the article ends with "_pf", that has to be included before the ".html" of the main article URL.

If there's someone interested linking these feeds, can help me writing a regular expression for these two feeds ?

I also download feed from one of the major italian newspaper, Corriere della Sera. Their printable link it's only without a "s" in the final ".shtml" extension of the URL. If I learn well how to rewrite links ...


PippoPippini is offline