![]() |
#1 |
Member
![]() Posts: 12
Karma: 10
Join Date: Sep 2011
Location: Chicago, Illinois, USA
Device: Nook Simple Touch
|
Chicago Tribune Recipe appears broken
It looks like the Chicago Tribune has added a step to viewing the initial page when following a feed. A page displays with the following message: "click here to continue to article" in multiple languages. If you click that link it proceeds normally. This seems to happen once per browser, so I'm guessing it's creating a cookie. Or, if you don't click, after a wait of perhaps 30 seconds, the page automatically proceeds. Unfortunately, it seems that the current Chicago Tribune recipe is not presently equipped to handle this new speed bump. As a result, every article in the converted text is simply the link and no article.
|
![]() |
![]() |
![]() |
#2 |
Member
![]() Posts: 12
Karma: 10
Join Date: Sep 2011
Location: Chicago, Illinois, USA
Device: Nook Simple Touch
|
I see that now that when you follow some links in a Chicago Tribune RSS, there are a few pages of ads to wait or click through. At this moment, I'm not seeing the "click here to continue to article" page, but I'm on a different PC with a different browser, et al, so it's hard to say for sure whether it's the content or just the environment that has changed from earlier today.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Member
![]() Posts: 12
Karma: 10
Join Date: Sep 2011
Location: Chicago, Illinois, USA
Device: Nook Simple Touch
|
Here's the source of the interim page that's causing the problem:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html> <head> <title>Advertisement</title> <style> A { color: gray; font-family: Arial; font-size: 10pt; font-weight: bold; } </style> </head> <body onload="setTimeout( 'location.href = \'http://www.chicagotribune.com/news/chi-rose-voted-allstar-starter-20120202,0,7880501.story?track=rss\'',18000);" ><div align="right"><p style="width: 250px; align:left; text-align:left; color: gray;"><a href="http://www.chicagotribune.com/news/chi-rose-voted-allstar-starter-20120202,0,7880501.story?track=rss">click here to continue to article</a><br> <a href="http://www.chicagotribune.com/news/chi-rose-voted-allstar-starter-20120202,0,7880501.story?track=rss">cliquez ici pour lire l'article</a><br> <a href="http://www.chicagotribune.com/news/chi-rose-voted-allstar-starter-20120202,0,7880501.story?track=rss">weiter zum Artikel</a><br> <a href="http://www.chicagotribune.com/news/chi-rose-voted-allstar-starter-20120202,0,7880501.story?track=rss">clicca qui per visualizzare l'articolo</a> <a href="http://www.chicagotribune.com/news/chi-rose-voted-allstar-starter-20120202,0,7880501.story?track=rss">weiter zum Artikel</a><br> <a href="http://www.chicagotribune.com/news/chi-rose-voted-allstar-starter-20120202,0,7880501.story?track=rss">ir a la noticia</a><br> <a href="http://www.chicagotribune.com/news/chi-rose-voted-allstar-starter-20120202,0,7880501.story?track=rss">klik hier om door te gaan naar het artikel</a><br> <a href="http://www.chicagotribune.com/news/chi-rose-voted-allstar-starter-20120202,0,7880501.story?track=rss">Yazıya devam etmek için tıklayın</a><br> <a href="http://www.chicagotribune.com/news/chi-rose-voted-allstar-starter-20120202,0,7880501.story?track=rss">Перейти к статье</a><br> <a href="http://www.chicagotribune.com/news/chi-rose-voted-allstar-starter-20120202,0,7880501.story?track=rss">继续阅读文章,请点击这里</a><br> <a href="http://www.chicagotribune.com/news/chi-rose-voted-allstar-starter-20120202,0,7880501.story?track=rss">Tovább a cikkre</a> </p></div> <div align="center"><div align="center"> <SCRIPT language='JavaScript1.1' SRC="http://ad.doubleclick.net/adj/N3867.289335.MEDIAFED.COM/B6175432.2;sz=300x250;pc=[TPAS_ID];click=http://da.feedsportal.com/c/34253/f/622809/s/1c5c323e/l/0L0Schicagotribune0N0Cnews0Cchi0Erose0Evoted0Ealls tar0Estarter0E20A120A20A20H0A0H7880A50A10Bstory0Dt rack0Frss/iac.htm?cp_lnk=7824_;ord=1472651?"> </SCRIPT> </div><script language="javascript"> document.write( "<img src=\"http://da.feedsportal.com/c/34253/f/622809/camp/7824/iad.gif\" />" ); </script></div> </body> </html> Last edited by cornfieldcraig; 02-02-2012 at 10:01 PM. |
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,196
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Try adding this to the recipe:
Code:
def skip_ad_pages(self, soup): text = soup.find(text='click here to continue to article') if text: a = text.parent url = a.get('href') if url: return self.index_to_soup(url, raw=True) |
![]() |
![]() |
![]() |
#5 |
Member
![]() Posts: 12
Karma: 10
Join Date: Sep 2011
Location: Chicago, Illinois, USA
Device: Nook Simple Touch
|
Thanks Kovid. Worked like a charm.
|
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Chicago Tribune Recipe not selecting full article | cornfieldcraig | Recipes | 3 | 09-29-2011 02:31 AM |
New Recipe - Wyoming Tribune Eagle Online | Tegan | Recipes | 0 | 02-12-2011 01:54 PM |
Chronicle Tribune recipe help | madman911 | Recipes | 0 | 01-29-2011 11:33 PM |
Fetch Hartford Courant based on Tribune recipe | Being | Calibre | 6 | 12-27-2009 09:54 AM |
Chicago Tribune now available on the Kindle! | daffy4u | Amazon Kindle | 14 | 08-11-2008 01:10 PM |