Trying to play with Calibre instead of fighting with the browser on the Kindle, just for giggles.
Starting to go through my various blogs, and started with
http://www.robbwolf.com
Here's the recipe so far:
PHP Code:
from calibre.web.feeds.recipes import BasicNewsRecipe
class RobbWolf(BasicNewsRecipe):
title = u'Robb Wolf - Paleo Solution'
__author__ = 'Erik M Jacobs'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
feeds = [(u'Robb Wolf - Paleo Solution', u'http://feeds.feedburner.com/RobbWolfThePaleoSolution?format=xml')]
keep_only_tags = dict(id='content')
remove_tags_after = [dict(name='div', attrs={'class':['endpost']})]
remove_tags = [dict(name='div', attrs={'align':['center']}),
dict(name='div', attrs={'class':['postinfo']})]
Main issue I'm having is that the h2 is a link and falls inside of the content, which seems to confuse Calibre. I end up with a single page on the Kindle with just the article title, and then the real article begins on the next page.
Is it possible to use regexp in the keep/remove/etc tags lines?
This is a standard Wordpress blog, but only the abstracts are presented. I tried messing around with the recipe for Mish's Global Economic Analysis but end up basically only getting the abstracts and no real articles.
Any suggestions here?