View Single Post
Old 10-11-2011, 05:12 PM   #1
thoraxe
Junior Member
thoraxe began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Oct 2011
Device: Kindle
Recipe for "Robb Wolf"

Trying to play with Calibre instead of fighting with the browser on the Kindle, just for giggles.

Starting to go through my various blogs, and started with http://www.robbwolf.com

Here's the recipe so far:
PHP Code:
from calibre.web.feeds.recipes import BasicNewsRecipe

class RobbWolf(BasicNewsRecipe):
    
title          u'Robb Wolf - Paleo Solution'
    
__author__  'Erik M Jacobs'
    
oldest_article 7
    max_articles_per_feed 
100
    no_stylesheets 
True
    use_embedded_content 
False
    feeds          
= [(u'Robb Wolf - Paleo Solution'u'http://feeds.feedburner.com/RobbWolfThePaleoSolution?format=xml')]
    
keep_only_tags dict(id='content')
    
remove_tags_after = [dict(name='div'attrs={'class':['endpost']})]
    
remove_tags = [dict(name='div'attrs={'align':['center']}),
                   
dict(name='div'attrs={'class':['postinfo']})] 
Main issue I'm having is that the h2 is a link and falls inside of the content, which seems to confuse Calibre. I end up with a single page on the Kindle with just the article title, and then the real article begins on the next page.

Is it possible to use regexp in the keep/remove/etc tags lines?

This is a standard Wordpress blog, but only the abstracts are presented. I tried messing around with the recipe for Mish's Global Economic Analysis but end up basically only getting the abstracts and no real articles.

Any suggestions here?
thoraxe is offline   Reply With Quote