06-17-2011, 03:46 AM
|
#2
|
Enthusiast
Posts: 30
Karma: 12
Join Date: Jun 2011
Location: India
Device: Kindle 3g
|
No need to use print version, even if if you use print version by using " def print_version(self, url):return url + '#printMode'"
you dont get a print page in calibre. It will parse all the image,header,footer etc.Here's my recipe, it works fine.Fetches all the articles no problem detected.
Quote:
#created by sexymax15 ....sexymax15@gmail.com
#Wall Street Journal(Spanish) recipe
import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.chardet import xml_to_unicode
class AdvancedUserRecipe1308289809(BasicNewsRecipe):
title = u'Wall Street Journal(Spanish)'
oldest_article = 7
max_articles_per_feed = 20
use_embedded_content = False
remove_empty_feeds = True
no_stylesheets = True
remove_javascript = True
remove_tags = [dict(name='img'),{'class':['header','articleSection first','articleThumbnail_1','insettipUnit insetZoomTarget','insetZoomTargetBox','insettipBox ','insettip']}]
keep_only_tags = {'class':['articlePage','byline','articleHeadlineBox headlineType-newswire']}
extra_css = ''' h1 {font-family:georgia,serif;font-size: large} '''
feeds = [(u'Wall Street Journal(Spanish)', u'http://online.wsj.com/xml/rss/3_7687.xml')]
|
Screenshot:
          
|
|
|