MobileRead Forums - View Single Post

sexymax15 · 06-17-2011, 03:46 AM

No need to use print version, even if if you use print version by using " def print_version(self, url):return url + '#printMode'"
you dont get a print page in calibre. It will parse all the image,header,footer etc.Here's my recipe, it works fine.Fetches all the articles no problem detected.

Quote:

#created by sexymax15 ....sexymax15@gmail.com
#Wall Street Journal(Spanish) recipe
import re

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.chardet import xml_to_unicode

class AdvancedUserRecipe1308289809(BasicNewsRecipe):
title = u'Wall Street Journal(Spanish)'
oldest_article = 7
max_articles_per_feed = 20
use_embedded_content = False

remove_empty_feeds = True
no_stylesheets = True
remove_javascript = True
remove_tags = [dict(name='img'),{'class':['header','articleSection first','articleThumbnail_1','insettipUnit insetZoomTarget','insetZoomTargetBox','insettipBox ','insettip']}]
keep_only_tags = {'class':['articlePage','byline','articleHeadlineBox headlineType-newswire']}
extra_css = ''' h1 {font-family:georgia,serif;font-size: large} '''

feeds = [(u'Wall Street Journal(Spanish)', u'http://online.wsj.com/xml/rss/3_7687.xml')]

Screenshot: