Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 07-26-2011, 11:51 AM   #1
leo738
Enthusiast
leo738 began at the beginning.
 
Posts: 39
Karma: 10
Join Date: Jul 2011
Device: Kindle 3
Irish Times - Recipe Problem

Hello All,

I'm currently running calibre on a Beagleboard XM (http://en.wikipedia.org/wiki/BeagleBoard). Because it's based on ARM architecture I'm stuck using version 0.7.4 of Calibre unless I want to do a full cross-compilation.

Anyway I'm having problems with the Irish Times recipe. I grabbed the recipe from v 0.8.1 of Calibre, here's an excerpt of the important bits:

Code:
('Ireland', 'http://rss.feedsportal.com/c/851/f/10845/index.rss'),
('World', 'http://rss.feedsportal.com/c/851/f/10846/index.rss'),
('Finance', 'http://rss.feedsportal.com/c/851/f/10847/index.rss'),
('Features', 'http://rss.feedsportal.com/c/851/f/10848/index.rss'),
('Sport', 'http://rss.feedsportal.com/c/851/f/10849/index.rss'),
('Opinion', 'http://rss.feedsportal.com/c/851/f/10850/index.rss'),
('Letters', 'http://rss.feedsportal.com/c/851/f/10851/index.rss'),
('Magazine', 'http://www.irishtimes.com/feeds/rss/newspaper/magazine.rss'),
('Health', 'http://rss.feedsportal.com/c/851/f/10852/index.rss'),
('Education & Parenting', 'http://rss.feedsportal.com/c/851/f/10853/index.rss'),
('Motors', 'http://rss.feedsportal.com/c/851/f/10854/index.rss'),
('An Teanga Bheo', 'http://www.irishtimes.com/feeds/rss/newspaper/anteangabheo.rss'),
('Commercial Property', 'http://www.irishtimes.com/feeds/rss/newspaper/commercialproperty.rss'),
('Science Today', 'http://www.irishtimes.com/feeds/rss/newspaper/sciencetoday.rss'),
('Property', 'http://www.irishtimes.com/feeds/rss/newspaper/property.rss'),
('The Tickets', 'http://www.irishtimes.com/feeds/rss/newspaper/theticket.rss'),
('Weekend', 'http://www.irishtimes.com/feeds/rss/newspaper/weekend.rss'),
('News features', 'http://www.irishtimes.com/feeds/rss/newspaper/newsfeatures.rss'),
('Obituaries', 'http://www.irishtimes.com/feeds/rss/newspaper/obituaries.rss'),

However it's still not fully correct. The 'Frontpage' section is ok but most of the other sections appear as:

Code:
click here to continue to article
cliquez ici pour lire l'article
weiter zum Artikel
clicca qui per visualizzare l'articolo weiter zum Artikel
ir a la noticia
klik hier om door te gaan naar het artikel
Yazıya devam etmek için tıklayın
Перейти к статье
继续阅读文章,请点击这里
Strangely enough I've experienced no problems with 'The Economist' or 'The Week'.

Does anybody have any ideas what might be causing this?

Many thanks,

Leo
leo738 is offline   Reply With Quote
Old 07-26-2011, 12:11 PM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by leo738 View Post
I'm currently running calibre on a Beagleboard XM (http://en.wikipedia.org/wiki/BeagleBoard). Because it's based on ARM architecture I'm stuck using version 0.7.4 of Calibre unless I want to do a full cross-compilation.
Cool!
Quote:
Anyway I'm having problems with the Irish Times recipe.
Strangely enough I've experienced no problems with 'The Economist' or 'The Week'.

Does anybody have any ideas what might be causing this?
I get the same thing with my FireFox browser that you are seeing with Calibre's recipe system. I block flash and scripts. When I go to that site, it hangs - telling me it's trying to display an advertisement and the script is not running. You'd need to rewrite the recipe to fix this.
Starson17 is offline   Reply With Quote
Advert
Old 07-26-2011, 04:31 PM   #3
phiznlil
Member
phiznlil began at the beginning.
 
Posts: 16
Karma: 12
Join Date: Mar 2011
Device: kindle 3
I have seen this occasionally, usually it is only one story though at the beginning of a section. Mine downloaded at 7am this morning and I see what you are seeing only for the first story under "Ireland", I am not sure what the cause is though.
phiznlil is offline   Reply With Quote
Old 07-26-2011, 04:42 PM   #4
oneillpt
Connoisseur
oneillpt began at the beginning.
 
Posts: 62
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
Quote:
Originally Posted by leo738 View Post
Anyway I'm having problems with the Irish Times recipe. I grabbed the recipe from v 0.8.1 of Calibre, here's an excerpt of the important bits:
Here is a fix for now, pending any possible fix from the authors. I've left the line replaced as a comment. No guarantee that this fix will continue to work - this fix is based on the structure of the redirected urls as of today, and this may change again in future. If a change is needed in the next few days I'll post it on this thread, otherwise if a change is needed later I'll post it as a new thread. (Your "important bits" are not in fact the real "important bits" here. The print_version routine, just after the feeds, is where modification was needed).

Spoiler:
Code:
__license__  = 'GPL v3'
__copyright__ = "2008, Derry FitzGerald. 2009 Modified by Ray Kinsella and David O'Callaghan, 2011 Modified by Phil Burns"
'''
irishtimes.com
'''
import re

from calibre.web.feeds.news import BasicNewsRecipe

class IrishTimes(BasicNewsRecipe):
    title          = u'The Irish Times'
    encoding  = 'ISO-8859-15'
    __author__    = "Derry FitzGerald, Ray Kinsella, David O'Callaghan and Phil Burns"
    language = 'en_IE'
    timefmt = ' (%A, %B %d, %Y)'


    oldest_article = 1.0
    max_articles_per_feed  = 100
    no_stylesheets = True
    simultaneous_downloads= 5

    r = re.compile('.*(?P<url>http:\/\/(www.irishtimes.com)|(rss.feedsportal.com\/c)\/.*\.html?).*')
    remove_tags    = [dict(name='div', attrs={'class':'footer'})]
    extra_css      = 'p, div { margin: 0pt; border: 0pt; text-indent: 0.5em } .headline {font-size: large;} \n .fact { padding-top: 10pt  }'

    feeds          = [
                      ('Frontpage', 'http://www.irishtimes.com/feeds/rss/newspaper/index.rss'),
                      ('Ireland', 'http://www.irishtimes.com/feeds/rss/newspaper/ireland.rss'),
                      ('World', 'http://www.irishtimes.com/feeds/rss/newspaper/world.rss'),
                      ('Finance', 'http://www.irishtimes.com/feeds/rss/newspaper/finance.rss'),
                      ('Features', 'http://www.irishtimes.com/feeds/rss/newspaper/features.rss'),
                      ('Sport', 'http://www.irishtimes.com/feeds/rss/newspaper/sport.rss'),
                      ('Opinion', 'http://www.irishtimes.com/feeds/rss/newspaper/opinion.rss'),
                      ('Letters', 'http://www.irishtimes.com/feeds/rss/newspaper/letters.rss'),
                      ('Magazine', 'http://www.irishtimes.com/feeds/rss/newspaper/magazine.rss'),
                      ('Health', 'http://www.irishtimes.com/feeds/rss/newspaper/health.rss'),
                      ('Education & Parenting', 'http://www.irishtimes.com/feeds/rss/newspaper/education.rss'),
                      ('Motors', 'http://www.irishtimes.com/feeds/rss/newspaper/motors.rss'),
                      ('An Teanga Bheo', 'http://www.irishtimes.com/feeds/rss/newspaper/anteangabheo.rss'),
                      ('Commercial Property', 'http://www.irishtimes.com/feeds/rss/newspaper/commercialproperty.rss'),
                      ('Science Today', 'http://www.irishtimes.com/feeds/rss/newspaper/sciencetoday.rss'),
                      ('Property', 'http://www.irishtimes.com/feeds/rss/newspaper/property.rss'),
                      ('The Tickets', 'http://www.irishtimes.com/feeds/rss/newspaper/theticket.rss'),
                      ('Weekend', 'http://www.irishtimes.com/feeds/rss/newspaper/weekend.rss'),
                      ('News features', 'http://www.irishtimes.com/feeds/rss/newspaper/newsfeatures.rss'),
                      ('Obituaries', 'http://www.irishtimes.com/feeds/rss/newspaper/obituaries.rss'),
                    ]


    def print_version(self, url):
        if url.count('rss.feedsportal.com'):
            #u = url.replace('0Bhtml/story01.htm','_pf0Bhtml/story01.htm')
            u = url.find('irishtimes')
            u = 'http://www.irishtimes.com' + url[u + 12:]
            u = u.replace('0C', '/')
            u = u.replace('A', '')
            u = u.replace('0Bhtml/story01.htm', '_pf.html')
        else:
            u = url.replace('.html','_pf.html')
        return u

    def get_article_url(self, article):
        return article.link
oneillpt is offline   Reply With Quote
Old 07-26-2011, 04:43 PM   #5
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by phiznlil View Post
I have seen this occasionally, usually it is only one story though at the beginning of a section. Mine downloaded at 7am this morning and I see what you are seeing only for the first story under "Ireland", I am not sure what the cause is though.
It's trying to use scripting to show you an ad. The recipe would have to be written to bypass the whole process.
Starson17 is offline   Reply With Quote
Advert
Old 07-26-2011, 06:33 PM   #6
oneillpt
Connoisseur
oneillpt began at the beginning.
 
Posts: 62
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
Quote:
Originally Posted by Starson17 View Post
It's trying to use scripting to show you an ad. The recipe would have to be written to bypass the whole process.
We posted almost simultaneously. The fix I posted builds the onward link to the article and uses it as the "printable version" of the page with the ad to which you are directed from the rss feed.
oneillpt is offline   Reply With Quote
Old 07-26-2011, 06:56 PM   #7
phiznlil
Member
phiznlil began at the beginning.
 
Posts: 16
Karma: 12
Join Date: Mar 2011
Device: kindle 3
This seems to be working nicely for now
phiznlil is offline   Reply With Quote
Old 07-27-2011, 07:30 AM   #8
markvdvelde
Connoisseur
markvdvelde began at the beginning.
 
Posts: 54
Karma: 12
Join Date: Jan 2011
Device: Kindle
My feeling is it mostly happens when you're downloading from multiple sources simultaneously. If you schedule downloading news with a few minutes in between each source, the result might be better. That's my experience at least.
markvdvelde is offline   Reply With Quote
Old 07-27-2011, 02:17 PM   #9
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by oneillpt View Post
The fix I posted builds the onward link to the article and uses it as the "printable version" of the page with the ad to which you are directed from the rss feed.
Great! I'm glad someone tackled it and provided a fix.
Starson17 is offline   Reply With Quote
Old 07-29-2011, 11:55 AM   #10
leo738
Enthusiast
leo738 began at the beginning.
 
Posts: 39
Karma: 10
Join Date: Jul 2011
Device: Kindle 3
Many thanks for the fix, it's been working for the last 2 days at least!

Leo
leo738 is offline   Reply With Quote
Old 08-31-2011, 12:15 PM   #11
leo738
Enthusiast
leo738 began at the beginning.
 
Posts: 39
Karma: 10
Join Date: Jul 2011
Device: Kindle 3
The latest fix seems to be working consistently now but I've noticed that the Irish Times occasionally posts the same story under multiple RSS links. Is there some way prevent the same story been included multiple times?

Thanks,

Leo
leo738 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Irish Times Recipe problem mbro Recipes 3 04-16-2011 08:11 AM
Modified Irish Times Recipe phiznlil Recipes 2 04-01-2011 06:27 AM
NY Times Recipe Changes bcollier Recipes 1 03-04-2011 11:52 AM
Irish Times recipe - no longer working patrickpc Recipes 1 11-17-2010 12:16 PM
Irish Times Newspaper Crash.. Boyodublin Calibre 1 12-03-2008 01:08 PM


All times are GMT -4. The time now is 04:32 PM.


MobileRead.com is a privately owned, operated and funded community.