View Single Post
Old 06-17-2011, 03:46 AM   #2
sexymax15
Enthusiast
sexymax15 began at the beginning.
 
sexymax15's Avatar
 
Posts: 30
Karma: 12
Join Date: Jun 2011
Location: India
Device: Kindle 3g
No need to use print version, even if if you use print version by using " def print_version(self, url):return url + '#printMode'"
you dont get a print page in calibre. It will parse all the image,header,footer etc.Here's my recipe, it works fine.Fetches all the articles no problem detected.


Quote:
#created by sexymax15 ....sexymax15@gmail.com
#Wall Street Journal(Spanish) recipe
import re

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.chardet import xml_to_unicode

class AdvancedUserRecipe1308289809(BasicNewsRecipe):
title = u'Wall Street Journal(Spanish)'
oldest_article = 7
max_articles_per_feed = 20
use_embedded_content = False

remove_empty_feeds = True
no_stylesheets = True
remove_javascript = True
remove_tags = [dict(name='img'),{'class':['header','articleSection first','articleThumbnail_1','insettipUnit insetZoomTarget','insetZoomTargetBox','insettipBox ','insettip']}]
keep_only_tags = {'class':['articlePage','byline','articleHeadlineBox headlineType-newswire']}
extra_css = ''' h1 {font-family:georgia,serif;font-size: large} '''

feeds = [(u'Wall Street Journal(Spanish)', u'http://online.wsj.com/xml/rss/3_7687.xml')]
Screenshot:









Attached Files
File Type: zip Wall Street Journal(Spanish)_1121.zip (695 Bytes, 271 views)
sexymax15 is offline   Reply With Quote