Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 12-05-2010, 09:26 AM   #1
veezh
plus ça change
veezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beauty
 
veezh's Avatar
 
Posts: 94
Karma: 32134
Join Date: Dec 2009
Location: France
Device: Kindle PW2
Updated recipe for Le Monde?

Just wondering if some kind soul could update the recipe for Le Monde. I have tried playing around a bit with the built-in recipe, but it looks like the website has undergone a lot of changes since the recipe was written, and the results I'm getting are pretty hit-and-miss. Any help greatly appreciated!

Edit: sorry, here's the link to the feeds:
http://www.lemonde.fr/web/rss/0,48-0,1-0,0.html

Last edited by veezh; 12-06-2010 at 08:06 AM.
veezh is offline   Reply With Quote
Old 12-06-2010, 04:11 PM   #2
Artemis_A
Train reader
Artemis_A began at the beginning.
 
Posts: 10
Karma: 15
Join Date: Nov 2010
Device: Kindle3
I would be interested, too.
Artemis_A is offline   Reply With Quote
 
Enthusiast
Old 12-09-2010, 06:57 AM   #3
veezh
plus ça change
veezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beauty
 
veezh's Avatar
 
Posts: 94
Karma: 32134
Join Date: Dec 2009
Location: France
Device: Kindle PW2
Just in case others may get some use out of this, I cobbled together my own recipe for Le Monde (I'm not a programmer, and my python skills are very limited), but it seems to be working OK.

The recipe also cleans up the non-typographical quotation marks used on the site, replacing them with proper guillemets (angle bracket quotes) and non-breaking spaces.

Code:
import re
from calibre.web.feeds.recipes import BasicNewsRecipe

class LeMonde(BasicNewsRecipe):
    title                  = 'Le Monde'
    description            = 'Actualités'
    oldest_article         = 1
    max_articles_per_feed  = 100
    no_stylesheets         = True
    #delay                  = 1
    use_embedded_content   = False
    encoding               = 'cp1252'
    publisher              = 'lemonde.fr'
    language               = 'fr_FR'
    conversion_options = {
                             'comments'        : description
                            ,'tags'            : category
                            ,'language'        : language
                            ,'publisher'       : publisher
                            ,'linearize_tables': True
                         }

    remove_empty_feeds = True

    filterDuplicates = True

    def preprocess_html(self, soup):
        for alink in soup.findAll('a'):
            if alink.string is not None:
               tstr = alink.string
               alink.replaceWith(tstr)
        return soup

    preprocess_regexps = [
        (re.compile(r' \''), lambda match: ' ‘'),
        (re.compile(r'\''), lambda match: '’'),
        (re.compile(r'"<'), lambda match: '&nbsp;&raquo;<'),
        (re.compile(r'>"'), lambda match: '>&laquo;&nbsp;'),
        (re.compile(r'&rsquo;"'), lambda match: '&rsquo;«&nbsp;'),
        (re.compile(r' "'), lambda match: ' &laquo;&nbsp;'),
        (re.compile(r'" '), lambda match: '&nbsp;&raquo; '),
        (re.compile(r'\("'), lambda match: '(&laquo;&nbsp;'),
        (re.compile(r'"\)'), lambda match: '&nbsp;&raquo;)'),
        (re.compile(r'"\.'), lambda match: '&nbsp;&raquo;.'),
        (re.compile(r'",'), lambda match: '&nbsp;&raquo;,'),
        (re.compile(r'"\?'), lambda match: '&nbsp;&raquo;?'),
        (re.compile(r'":'), lambda match: '&nbsp;&raquo;:'),
        (re.compile(r'";'), lambda match: '&nbsp;&raquo;;'),
        (re.compile(r'"\!'), lambda match: '&nbsp;&raquo;!'),
        (re.compile(r' :'), lambda match: '&nbsp;:'),
        (re.compile(r' ;'), lambda match: '&nbsp;;'),
        (re.compile(r' \?'), lambda match: '&nbsp;?'),
        (re.compile(r' \!'), lambda match: '&nbsp;!'),
        (re.compile(r'\s»'), lambda match: '&nbsp;»'),
        (re.compile(r'«\s'), lambda match: '«&nbsp;'),
        (re.compile(r' %'), lambda match: '&nbsp;%'),
        (re.compile(r'\.jpg&nbsp;&raquo; border='), lambda match: '.jpg'),
        (re.compile(r'\.png&nbsp;&raquo; border='), lambda match: '.png'),
        ]

    keep_only_tags    = [
                       dict(name='div', attrs={'class':['contenu']})
                        ]

    remove_tags_after = [dict(id='appel_temoignage')]

    def get_article_url(self, article): 
        link = article.get('link')
        if 'blog' not in link:
             return link



    feeds          = [
                      ('A la une', 'http://www.lemonde.fr/rss/une.xml'),
                      ('International', 'http://www.lemonde.fr/rss/tag/international.xml'),
                      ('Europe', 'http://www.lemonde.fr/rss/tag/europe.xml'),
                      (u'Société', 'http://www.lemonde.fr/rss/tag/societe.xml'),
                      ('Economie', 'http://www.lemonde.fr/rss/tag/economie.xml'),
                      (u'Médias', 'http://www.lemonde.fr/rss/tag/actualite-medias.xml'),
                      (u'Planète', 'http://www.lemonde.fr/rss/tag/planete.xml'),
                      ('Culture', 'http://www.lemonde.fr/rss/tag/culture.xml'),
                      ('Technologies', 'http://www.lemonde.fr/rss/tag/technologies.xml'),
                      ('Livres', 'http://www.lemonde.fr/rss/tag/livres.xml'),

                    ]

    def get_cover_url(self):
        cover_url = None
        soup = self.index_to_soup('http://www.lemonde.fr/web/monde_pdf/0,33-0,1-0,0.html')
        link_item = soup.find('div',attrs={'class':'pg-gch'})

        if link_item and link_item.img:
           cover_url = link_item.img['src']

        return cover_url

Last edited by veezh; 12-09-2010 at 06:59 AM.
veezh is offline   Reply With Quote
Old 12-09-2010, 06:17 PM   #4
Artemis_A
Train reader
Artemis_A began at the beginning.
 
Posts: 10
Karma: 15
Join Date: Nov 2010
Device: Kindle3
Merci beaucoup :-)

That seems to be working. The only change I had to make, was to remove the conversion options. They caused error messages.
Artemis_A is offline   Reply With Quote
Old 12-29-2010, 01:53 PM   #5
Jackgal
Junior Member
Jackgal began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Dec 2010
Device: Sony PRS-650
Hello,

The Epub file generated from "Le Monde" Recipe is not correctly displayed on the Sony PRS-650 : pages containing picture are blank after the title.

Is there a way to discard the pictures in the conversion process, or from the recipe ?

Thanks !
Jackgal is offline   Reply With Quote
Old 01-20-2011, 09:06 PM   #6
SeaBookGuy
Can one read too much?
SeaBookGuy ought to be getting tired of karma fortunes by now.SeaBookGuy ought to be getting tired of karma fortunes by now.SeaBookGuy ought to be getting tired of karma fortunes by now.SeaBookGuy ought to be getting tired of karma fortunes by now.SeaBookGuy ought to be getting tired of karma fortunes by now.SeaBookGuy ought to be getting tired of karma fortunes by now.SeaBookGuy ought to be getting tired of karma fortunes by now.SeaBookGuy ought to be getting tired of karma fortunes by now.SeaBookGuy ought to be getting tired of karma fortunes by now.SeaBookGuy ought to be getting tired of karma fortunes by now.SeaBookGuy ought to be getting tired of karma fortunes by now.
 
SeaBookGuy's Avatar
 
Posts: 1,751
Karma: 1064957
Join Date: Aug 2010
Location: Seattle, USA
Device: Sony 650
I tried a couple of days ago - some articles were okay, some were only headlines, with no "body" - is that because it's a pay site?
SeaBookGuy is offline   Reply With Quote
Reply

Tags
french, monde, recipe

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Updated Telepolis (News+Artikel) Recipe syntaxis Recipes 8 05-15-2011 06:40 AM
Updated New York Times recipe nickredding Recipes 2 11-20-2010 10:53 AM
[Updated recipe] Ming Pao (明報) - Hong Kong tylau0 Recipes 0 11-06-2010 06:46 PM
Request for Le Monde Diplo En archive recipe michaelernst Recipes 6 10-17-2010 11:13 AM
Updated New Yorker recipe doesn't fetch comics yekim54 Recipes 2 10-09-2010 10:47 PM


All times are GMT -4. The time now is 04:00 AM.


MobileRead.com is a privately owned, operated and funded community.