Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 04-29-2012, 03:07 PM   #1
atordo
Connoisseur
atordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to behold
 
Posts: 89
Karma: 19669
Join Date: Apr 2012
Device: Kindle Touch
Help with Wordpress feed (El Mundo Today)

I'm trying to create a recipe for:
http://www.elmundotoday.com/feed/

I've tweaked with several recipes for Wordpress found in this very forum to no avail, the index is always empty (no articles) although manually downloading the feed shows articles there.

The feed is compressed with gzip, but I guess this should not be a problem for Calibre?

Below is my last attempt:
Code:
lass AdvancedUserRecipe1335711936(BasicNewsRecipe):
    title          = u'El Mundo Today'
    description = 'La actualidad del mañana'
    cover_url = 'http://www.elmundotoday.com/wp-content/themes/EarthlyTouch/images/logo.png'
    oldest_article = 365
    max_articles_per_feed = 100
    auto_cleanup = False
    no_stylesheets = True
    language = 'es_ES'
    use_embedded_content  = True

    feeds  = [(u'El Mundo Today', u'http://www.elmundotoday.com/feed/')]
TIA.
atordo is offline   Reply With Quote
Old 04-29-2012, 06:42 PM   #2
atordo
Connoisseur
atordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to behold
 
Posts: 89
Karma: 19669
Join Date: Apr 2012
Device: Kindle Touch
I should have tried this before posting:

Uncompressed the RSS file, copied it to the data directory of a very simple web server that runs in my computer, then pointed the feed in the recipe to localhost. Articles now show up.

So it seems gzip compression was indeed the problem.
atordo is offline   Reply With Quote
Advert
Old 04-29-2012, 10:55 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
If you want calibre to handle gzip transparently, use

Code:
def get_broser(self):
    br = BasicNewsRecipe.get_browser(self)
    br.set_handle_gzip(True)
    return br
That should do the trick, though I haven't tested it.
kovidgoyal is online now   Reply With Quote
Old 04-30-2012, 06:25 AM   #4
atordo
Connoisseur
atordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to behold
 
Posts: 89
Karma: 19669
Join Date: Apr 2012
Device: Kindle Touch
Thanks Kovid, that did it. Below is a working version of the recipe in case someone else is interested in the site:

Code:
from calibre.web.feeds.news import BasicNewsRecipe

class ElMundoTodayRecipe(BasicNewsRecipe):
    title = 'El Mundo Today'
    description = u'La actualidad del mañana'
    category = 'Noticias, humor'
    cover_url = 'http://www.elmundotoday.com/wp-content/themes/EarthlyTouch/images/logo.png'
    oldest_article = 30
    max_articles_per_feed = 30
    auto_cleanup = True
    no_stylesheets = True
    language = 'es_ES'
    use_embedded_content  = True

    feeds = [('El Mundo Today', 'http://www.elmundotoday.com/feed/')]

    def get_broser(self):
        br = BasicNewsRecipe.get_browser(self)
        br.set_handle_gzip(True)
        return br
atordo is offline   Reply With Quote
Old 06-05-2012, 11:18 PM   #5
atordo
Connoisseur
atordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to behold
 
Posts: 89
Karma: 19669
Join Date: Apr 2012
Device: Kindle Touch
Updated version with better page parsing and some CSS for eye candy.

Spoiler:
Code:
import re
from calibre.web.feeds.news import BasicNewsRecipe

class ElMundoTodayRecipe(BasicNewsRecipe):
    title = 'El Mundo Today'
    description = u'La actualidad del mañana'
    category = 'Noticias, humor'
    cover_url = 'http://www.elmundotoday.com/wp-content/themes/EarthlyTouch/images/logo.png'
    oldest_article = 30
    max_articles_per_feed = 60
    auto_cleanup = False
    no_stylesheets = True
    remove_javascript = True
    language = 'es_ES'
    use_embedded_content  = False

    preprocess_regexps = [
        (re.compile(r'</title>.*<!--Begin Article Single-->', re.DOTALL),
        lambda match: '</title><body>'),
        #(re.compile(r'^\t{5}<a href.*Permanent Link to ">$'), lambda match: ''),
        #(re.compile(r'\t{5}</a>$'), lambda match: ''),
        (re.compile(r'<div class="social4i".*</body>', re.DOTALL),
        lambda match: '</body>'),
    ]

    keep_only_tags = [
        dict(name='div', attrs={'class':'post-wrapper'})
    ]

    remove_attributes = [ 'href', 'title', 'alt' ]

    extra_css = '''
        .antetitulo{font-variant:small-caps; font-weight:bold} .articleinfo{font-size:small}
        img{margin-bottom:0.4em; display:block; margin-left:auto; margin-right:auto}
    '''

    feeds = [('El Mundo Today', 'http://www.elmundotoday.com/feed/')]

    def get_broser(self):
        br = BasicNewsRecipe.get_browser(self)
        br.set_handle_gzip(True)
        return br
atordo is offline   Reply With Quote
Advert
Old 06-06-2012, 02:23 AM   #6
Terisa de morgan
Grand Sorcerer
Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.
 
Terisa de morgan's Avatar
 
Posts: 6,227
Karma: 11768331
Join Date: Jun 2009
Location: Madrid, Spain
Device: Kobo Clara/Aura One/Forma,XiaoMI 5, iPad, Huawei MediaPad, YotaPhone 2
Thank you, I'm interested (and always surprised by their news )
Terisa de morgan is offline   Reply With Quote
Old 06-06-2012, 01:49 PM   #7
atordo
Connoisseur
atordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to behold
 
Posts: 89
Karma: 19669
Join Date: Apr 2012
Device: Kindle Touch
Glad to know it's of use to someone else
atordo is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Psychology today news feed failing to download Shuichiro Recipes 1 05-14-2011 05:11 AM
BBC Mundo (v1.0) - Spanish tolyluis Recipes 0 01-29-2011 07:12 PM
Hola mundo johansolo Introduce Yourself 6 08-22-2009 09:53 PM
Wordpress Vs Textpattern Moejoe Lounge 4 03-06-2009 11:46 AM
iLiad review in El Mundo (Spanish newspaper) ElaHuguet iRex 1 08-17-2007 10:15 AM


All times are GMT -4. The time now is 11:28 PM.


MobileRead.com is a privately owned, operated and funded community.