Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 10-11-2011, 05:12 PM   #1
thoraxe
Junior Member
thoraxe began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Oct 2011
Device: Kindle
Recipe for "Robb Wolf"

Trying to play with Calibre instead of fighting with the browser on the Kindle, just for giggles.

Starting to go through my various blogs, and started with http://www.robbwolf.com

Here's the recipe so far:
PHP Code:
from calibre.web.feeds.recipes import BasicNewsRecipe

class RobbWolf(BasicNewsRecipe):
    
title          u'Robb Wolf - Paleo Solution'
    
__author__  'Erik M Jacobs'
    
oldest_article 7
    max_articles_per_feed 
100
    no_stylesheets 
True
    use_embedded_content 
False
    feeds          
= [(u'Robb Wolf - Paleo Solution'u'http://feeds.feedburner.com/RobbWolfThePaleoSolution?format=xml')]
    
keep_only_tags dict(id='content')
    
remove_tags_after = [dict(name='div'attrs={'class':['endpost']})]
    
remove_tags = [dict(name='div'attrs={'align':['center']}),
                   
dict(name='div'attrs={'class':['postinfo']})] 
Main issue I'm having is that the h2 is a link and falls inside of the content, which seems to confuse Calibre. I end up with a single page on the Kindle with just the article title, and then the real article begins on the next page.

Is it possible to use regexp in the keep/remove/etc tags lines?

This is a standard Wordpress blog, but only the abstracts are presented. I tried messing around with the recipe for Mish's Global Economic Analysis but end up basically only getting the abstracts and no real articles.

Any suggestions here?
thoraxe is offline   Reply With Quote
Old 10-12-2011, 09:13 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by thoraxe View Post
Is it possible to use regexp in the keep/remove/etc tags lines?
Yes.

Here's how I used it in the Skeptic recipe to remove div tags with an id that started with "follow":
Code:
    remove_tags = [dict(name='div', attrs={'class':['Introduction','divider']}),
                  dict(name='div', attrs={'id':['feature', 'podcast']}),
                  dict(name='div', attrs={'id':re.compile(r'follow.*', re.DOTALL|re.IGNORECASE)}), 
                  dict(name='hr'),
                  ]
Starson17 is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Recipe for "Galicia Confidencial" and "De L a V" roebek Recipes 1 07-19-2011 09:17 AM
Recipe for Dutch newspaper "Dagblad van het Noorden" reijndert Recipes 2 05-18-2011 07:52 AM
How to prevent recipe read "files" pdf on web rss? KRorschachZ Recipes 12 11-10-2010 02:59 PM
"The Were Wolf" by Clemence Housman Lobolover Reading Recommendations 0 04-25-2008 06:33 AM


All times are GMT -4. The time now is 11:39 PM.


MobileRead.com is a privately owned, operated and funded community.