Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > Miscellaneous > Archive > Sunrise

Notices

 
 
Thread Tools Search this Thread
Old 08-24-2006, 01:14 PM   #1
goducks
Junior Member
goducks began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Aug 2006
Device: Palm TX
retrieving printer-friendly versions...

I've been trying to figure this out for a week to no avail so if anyone can help, I'd be really grateful. There are several writers on Sports Illustrated online you can get RSS feeds for, but SI.com spreads the articles out over 3 or 4 pages, so I'd like to nab just the printer-friendly version, but I can't seem to configure Sunrise XP to do it correctly.

For example, Dr Z's feed is at http://rss.cnn.com/rss/si_dr_z.rss
All of the articles off it start with http://sportsillustrated.cnn.com/2006/writers/dr_z*

and on the first page of each is a link to a single-page printer friendly version that begins http://si.printthis.clickability.com/*

How in the world do I get Sunrise to get that version?
goducks is offline  
Old 08-25-2006, 08:35 AM   #2
DTM
Intentionally Left Blank
DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.
 
DTM's Avatar
 
Posts: 172
Karma: 300106
Join Date: Feb 2006
Location: Royal Oak, MI, USA
Device: Nook STR
I think you're out of luck on this one. They've gone to extreme lengths to make it impossible--as far as I can see--to identify the link to the printable version.

If you open the "printable" window, right-click and look at the page properties, you'll see that its URL is very long and very complex. It includes a phrase that is not part of the original page and also includes an eight-digit number that is not found anywhere in the source on the original page. If that information isn't there, then there is no way Sunrise is going to find it.

But your problem is even worse. You need to be able to construct the "printable" link not from the information in the main article page, but rather from just the information on the RSS page you're starting with. That means that the information that uniquely identifies the printable version must be in the link you start with. It's just not there. Sorry.
DTM is offline  
Old 08-25-2006, 11:14 AM   #3
goducks
Junior Member
goducks began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Aug 2006
Device: Palm TX
Thanks DTM. I was starting to guess that myself. I hadn't realized that Sunrise had to reconstruct the link from the RSS feed address, though now that makes sense. I had thought we were somehow training it to find the printer friendly version off of the articles themselves, the Wiki training article didn't make that clear.
goducks is offline  
Old 09-21-2006, 12:05 PM   #4
PippoPippini
Member
PippoPippini began at the beginning.
 
Posts: 19
Karma: 29
Join Date: Sep 2006
Device: Palm TX
Quote:
Originally Posted by DTM
I think you're out of luck on this one. They've gone to extreme lengths to make it impossible--as far as I can see--to identify the link to the printable version.

If you open the "printable" window, right-click and look at the page properties, you'll see that its URL is very long and very complex. It includes a phrase that is not part of the original page and also includes an eight-digit number that is not found anywhere in the source on the original page. If that information isn't there, then there is no way Sunrise is going to find it.

But your problem is even worse. You need to be able to construct the "printable" link not from the information in the main article page, but rather from just the information on the RSS page you're starting with. That means that the information that uniquely identifies the printable version must be in the link you start with. It's just not there. Sorry.
Hi.

I think to have a similar problem with RSS feeds from Reuters.

Articles linked from RSS feed are divided in multiple pages. There is a link to a printable version, but it is in a pop-up, with a sintax that use a string of text used in the article's URL.

Analyzing Bloomberg RSS feeds, I think that probably it's possible to link easily the printable page, because the printable link has only a "#" at the end.

I also analyzed the feed of Washington Post.

In the RSS feed links are like this:
http://www.washingtonpost.com/wp-dyn...v=rss_business

The printable one is:
http://www.washingtonpost.com/wp-dyn...001064_pf.html

The referring to the article ends with "_pf", that has to be included before the ".html" of the main article URL.

If there's someone interested linking these feeds, can help me writing a regular expression for these two feeds ?

I also download feed from one of the major italian newspaper, Corriere della Sera. Their printable link it's only without a "s" in the final ".shtml" extension of the URL. If I learn well how to rewrite links ...

Bye

Pippo
PippoPippini is offline  
Old 09-29-2006, 11:10 AM   #5
PippoPippini
Member
PippoPippini began at the beginning.
 
Posts: 19
Karma: 29
Join Date: Sep 2006
Device: Palm TX
Quote:
Originally Posted by PippoPippini

...
Analyzing Bloomberg RSS feeds, I think that probably it's possible to link easily the printable page, because the printable link has only a "#" at the end.

...
HI.

I tried rewriting link of Bloomberg`s feed.

The link filter is http://www\.bloomberg\.com(.*), while the rewrite rule I wrote is http://www.bloomberg.com$1#

But it doesn`t work. What`s wrong ?

G.
PippoPippini is offline  
Old 10-05-2006, 12:44 PM   #6
DTM
Intentionally Left Blank
DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.
 
DTM's Avatar
 
Posts: 172
Karma: 300106
Join Date: Feb 2006
Location: Royal Oak, MI, USA
Device: Nook STR
I haven't forgotten you!

This one's a bit tougher than most, but I think I have it. I just want to do some more testing and will then post the answer.

(They use some code numbers that I'm afraid might change from day to day, so I don't want to give you a "solution" that will fail tomorrow.)
DTM is offline  
Old 10-06-2006, 12:12 PM   #7
DTM
Intentionally Left Blank
DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.
 
DTM's Avatar
 
Posts: 172
Karma: 300106
Join Date: Feb 2006
Location: Royal Oak, MI, USA
Device: Nook STR
Quote:
Originally Posted by PippoPippini
I tried rewriting link of Bloomberg`s feed.

The link filter is http://www\.bloomberg\.com(.*), while the rewrite rule I wrote is http://www.bloomberg.com$1#

But it doesn`t work. What`s wrong ?
Got it!

Your rule doesn't work because the printable page is not just the original URL with a # appended. That's what you see when you hover on the link, but when you actually click the link, it runs a Java Script routine that constructs the real URL.

This is a good example to use to show how to crack the more advanced problems. To find the real URL for the printable version, click the link for the printer version. When the printer-friendly box opens, right-click on it and select Properties from Internet Explorer or View Page Info from Firefox. That will give you the actual URL for the page.

The example I used was the Economy RSS feed. One of the article pages was:

http://www.bloomberg.com/apps/news=?pid=20601068&sid=avAOrwcRZaAU&refer=economy

The printable version was:

http://www.bloomberg.com/apps/news=?pid=20670001&refer=economy&sid=avAOrwcRZaAU

Comparing the two, we see that there are two changes: the pid number is different and the "sid" section has swapped places with the "refer" section.

We are fortunate because it turns out that all of the printable pages for all of the feeds have a pid of 20670001. Our rewrite rule, then, must change the pid number and move the sid segment to the end.

The filter looks like this:

http://www\.bloomberg\.com/apps/news\?pid=.*&sid=(.*)&refer=economy

Notice that the first .* is not in parentheses, because we don't need to save the value; the number changes in the rewrite. The second is needed, however. Notice also that you must backslash periods and question marks.

The rewritten expression looks like this:

http://www.bloomberg.com/apps/news?pid=20670001&refer=economy&sid=$1

It is identical up to the pid, where we substitute the new number 20670001, then we follow that with the "refer" segment and end with the "sid" segment. The $1 inserts the sid code that we captured from the original link.

I checked only a couple of the feeds, but this appears to work for all of them. The only variation from one feed to another is that the "refer" part changes from "economy" to "politics", etc.

Enjoy!

----------------Edit------------------

It occurred to me that you may want to use a variation on the above, as follows:

Filter:
http://www\.bloomberg\.com/apps/news\?pid=.*&sid=(.*)&refer=(.*)

Rewrite:
http://www.bloomberg.com/apps/news?pid=20670001&refer=$2&sid=$1

This allows you to use exactly the same rule for all of the feeds, making it easier to copy and modify the Sunrise documents for each feed.

Last edited by DTM; 10-07-2006 at 11:24 PM.
DTM is offline  
Old 10-10-2006, 10:17 AM   #8
PippoPippini
Member
PippoPippini began at the beginning.
 
Posts: 19
Karma: 29
Join Date: Sep 2006
Device: Palm TX
Quote:
Originally Posted by DTM
Got it!
...
GREAT !!

It works !

I don't know how to thank your efforts.

I'm now trying to elaborate other newspaper sites: it's the only way I know to make my contribution.

Bye

Gaetano
PippoPippini is offline  
Old 10-10-2006, 12:50 PM   #9
PippoPippini
Member
PippoPippini began at the beginning.
 
Posts: 19
Karma: 29
Join Date: Sep 2006
Device: Palm TX
Ok, here 2 new rewrite rules for other newspapers.

Washington Post:

Pattern: http://www\.washingtonpost\.com(.*)\.html(.*)
Rewrite rule: http://www.washingtonpost.com$1_pf.html

Corriere della Sera (Italian newspaper)

Pattern: http://www\.corriere\.it(.*)shtml
Rewrite rule: http://www.corriere.it$1html

Now I'm analyzing International Herald Tribune and Reuters

With IHT I tried this rule:
Pattern: http://www\.iht\.com(.*)
Rewrite rule: http://www.iht.com/bin/print_ipub.php?file=$1

But it doesn't work: articles downloaded have only title, author and date.

See later

Gaetano
PippoPippini is offline  
Old 10-10-2006, 01:02 PM   #10
doctorow
Guru
doctorow ought to be getting tired of karma fortunes by now.doctorow ought to be getting tired of karma fortunes by now.doctorow ought to be getting tired of karma fortunes by now.doctorow ought to be getting tired of karma fortunes by now.doctorow ought to be getting tired of karma fortunes by now.doctorow ought to be getting tired of karma fortunes by now.doctorow ought to be getting tired of karma fortunes by now.doctorow ought to be getting tired of karma fortunes by now.doctorow ought to be getting tired of karma fortunes by now.doctorow ought to be getting tired of karma fortunes by now.doctorow ought to be getting tired of karma fortunes by now.
 
doctorow's Avatar
 
Posts: 914
Karma: 3410461
Join Date: May 2004
Device: Kindle Touch
You should probably use http://www\.iht\.com/articles/(.*) as pattern, and then rewrite with http://www.iht.com/bin/print_ipub.php?file=/articles/$1
doctorow is offline  
Old 10-11-2006, 03:55 AM   #11
PippoPippini
Member
PippoPippini began at the beginning.
 
Posts: 19
Karma: 29
Join Date: Sep 2006
Device: Palm TX
Quote:
Originally Posted by doctorow
You should probably use http://www\.iht\.com/articles/(.*) as pattern, and then rewrite with http://www.iht.com/bin/print_ipub.php?file=/articles/$1
I tried, but still doesn't work.

The "/articles/" pattern doesn't change behaviour of the filter.

I'll continue to investigate.

Someone is interested in Reuters feed ?

This is the pattern rule:
http://today\.reuters\.com(.*)type=businessNews&story(.*)
and this is the rewrite rule: http://today.reuters.com/misc/PrinterFriendlyPopup.aspx?type=businessNews&story$ 2

Enjoy

Gaetano
PippoPippini is offline  
Old 10-11-2006, 04:08 AM   #12
Laurens
Jah Blessed
Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.
 
Laurens's Avatar
 
Posts: 1,295
Karma: 1373
Join Date: Apr 2003
Location: The Netherlands
Device: iPod Touch
Quote:
Originally Posted by PippoPippini
With IHT I tried this rule:
Pattern: http://www\.iht\.com(.*)
Rewrite rule: http://www.iht.com/bin/print_ipub.php?file=$1

But it doesn't work: articles downloaded have only title, author and date.
This pattern is correct, but unfortunately theTidy HTML parser that Sunrise XP uses can't handle this page. It has trouble dealing with inline JavaScript sometimes.

As a fallback, you can use http://mobile.iht.com/
Laurens is offline  
Old 10-11-2006, 09:06 AM   #13
PippoPippini
Member
PippoPippini began at the beginning.
 
Posts: 19
Karma: 29
Join Date: Sep 2006
Device: Palm TX
Quote:
Originally Posted by Laurens
As a fallback, you can use http://mobile.iht.com/
Well, it works.

I rewrite the pattern rule so: http://www\.iht\.com(.*)php
The rewrite rule is: http://mobile.iht.com/$1xhtml

I continue using links in the RSS feed, translating to pages referred by the mobile version of the site.

Linkink directly the site http://mobile.iht.com/ and download from here the articles is not useful if one person wants articles from sections not referred in the sections of the Mobile home page, but present only in the second page.

Thanks for your help Laurens.

Gaetano
PippoPippini is offline  
Old 10-14-2006, 11:09 AM   #14
The Collector
Junior Member
The Collector began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Oct 2006
Been trying to crunch out NYT by brute force, but I'm being rather unsuccessful. First post dolt here. :/

(http://www\.nytimes\.com)/(2006)/(..)/(..)/(world)/(africa|americas|asia|europe|middleeast)/(.*\.html?ref=world)
Rewrite:
$1/$2/$3/$4/$5/$6/$7&pagewanted=print

With my luck it's something simple. I'm pretty sure that many of the rewrites are redundant, and I should attempt to clean it up, but it would seem futile if I can't even get it to work properly in the first place.
The Collector is offline  
Old 10-14-2006, 11:13 AM   #15
Laurens
Jah Blessed
Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.
 
Laurens's Avatar
 
Posts: 1,295
Karma: 1373
Join Date: Apr 2003
Location: The Netherlands
Device: iPod Touch
For NYT feeds and rewriting patterns, look in the Showcase

Pattern: http://.*\.?nytimes\.com/.*\.html?.*
Rewrite: $&&pagewanted=print
Laurens is offline  
 


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
HP Starts Shipping iPad-Friendly Printer kjk Apple Devices 7 06-29-2010 04:19 PM
Kindle 2 as a printer mowbray Amazon Kindle 10 02-21-2010 03:16 PM
Error retrieving covers danwdoo Calibre 1 04-05-2009 09:00 AM
Sur Amazon, des versions électroniques plus chères que les versions papier IreneDelse Amazon Kindle 8 08-29-2008 08:35 AM
Retrieving Multiple News Sources in libprs500 dsuden Calibre 2 04-29-2008 03:03 PM


All times are GMT -4. The time now is 03:07 AM.


MobileRead.com is a privately owned, operated and funded community.