Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 08-17-2011, 09:28 AM   #1
emai7s2
Connoisseur
emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.
 
Posts: 97
Karma: 128170
Join Date: Sep 2008
Device: Palm & PPC
Liberation recipe downloading only headlines

I think Liberation changed something on their website this past weekend. Since then, Calibre only downloads news headlines from Liberation without the accompanying articles.
emai7s2 is offline   Reply With Quote
Old 08-18-2011, 04:02 PM   #2
oneillpt
Connoisseur
oneillpt began at the beginning.
 
Posts: 62
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
Quote:
Originally Posted by emai7s2 View Post
I think Liberation changed something on their website this past weekend. Since then, Calibre only downloads news headlines from Liberation without the accompanying articles.
Here is a quick fix to use pending any revision by the author. It probably still retains lines which have now become redundant. It also seems to find a few headlines which I do not see in the RSS feeds - possibly photo features?

Spoiler:
Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = '2008, Darko Miletic <darko.miletic at gmail.com>'
'''
liberation.fr
'''

from calibre.web.feeds.news import BasicNewsRecipe

class Liberation(BasicNewsRecipe):
    title                 = u'Liberation'
    __author__            = 'Darko Miletic'
    description           = 'News from France'
    language = 'fr'

    oldest_article        = 7
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    
    html2lrf_options = ['--base-font-size', '10']

    keep_only_tags    = [
                           dict(name='h1')
                          #,dict(name='div', attrs={'class':'object-content text text-item'})
                          ,dict(name='div', attrs={'class':'article'})
                          #,dict(name='div', attrs={'class':'articleContent'})
                          ,dict(name='div', attrs={'class':'entry'})
                        ]
    remove_tags_after = [ dict(name='div',attrs={'class':'toolbox extra_toolbox'}) ]
    remove_tags    = [
                        dict(name='p', attrs={'class':'clear'})
                       ,dict(name='ul', attrs={'class':'floatLeft clear'})
                       ,dict(name='div', attrs={'class':'clear floatRight'})
                       ,dict(name='object')
                       ,dict(name='div', attrs={'class':'toolbox'})
                       ,dict(name='div', attrs={'class':'cartridge cartridge-basic-bubble cat-zoneabo'})
                       #,dict(name='div', attrs={'class':'clear block block-call-items'})
                       ,dict(name='div', attrs={'class':'block-content'})
                     ]
    
    feeds          = [
                         (u'La une', u'http://www.liberation.fr/rss/laune')
                        ,(u'Monde' , u'http://www.liberation.fr/rss/monde')
                        ,(u'Sports', u'http://www.liberation.fr/rss/sports')
                     ]
oneillpt is offline   Reply With Quote
Advert
Old 08-19-2011, 03:52 AM   #3
emai7s2
Connoisseur
emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.
 
Posts: 97
Karma: 128170
Join Date: Sep 2008
Device: Palm & PPC
Quote:
oneillpt wrote: Here is a quick fix to use pending any revision by the author. It probably still retains lines which have now become redundant. It also seems to find a few headlines which I do not see in the RSS feeds - possibly photo features?
Thank you oneillpt - I will give it a try this weekend. I've never added a script like this before to Calibre. It will be a challenge!

emai7s2 is offline   Reply With Quote
Old 08-20-2011, 12:47 AM   #4
emai7s2
Connoisseur
emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.emai7s2 trips the light fantastic.
 
Posts: 97
Karma: 128170
Join Date: Sep 2008
Device: Palm & PPC
Hi oneillpt, I tried your 'quick fix' first thing this morning: it worked! It wasn't a challenge at all: copy recipe, past recipe, save recipe, download recipe. I think it even finds more articles than the previous recipe.

Thanks again for your help!
emai7s2 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Financial Times recipe downloading slowly, empty pages mapex Recipes 34 06-06-2013 06:27 AM
MacWorld recipe - only headlines - no articles simonz Recipes 4 06-04-2011 09:02 AM
Truncation of the NYTimes Headlines recipe Nanoox Recipes 7 03-05-2011 10:49 PM
Beneath Ceaseless Skies recipe for direct epub downloading duckpuppy Recipes 5 02-23-2011 10:12 PM
Downloading several years of blogposts via custom recipe flyash Calibre 4 01-01-2011 02:02 AM


All times are GMT -4. The time now is 10:52 AM.


MobileRead.com is a privately owned, operated and funded community.