View Single Post
Old 05-17-2010, 05:10 PM   #1929
mwheinz
award-winning bozo
mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.
 
Posts: 258
Karma: 172703
Join Date: Sep 2009
Location: Philadelphia
Device: Kobo Libra 2
American Prospect Recipe

American Prospect Recipe

sdow1 - try this recipe. It's very simple, strips out all formatting at the moment.

Code:
import re

class AdvancedUserRecipe1273850169(BasicNewsRecipe):
    title          = u'American Prospect'
    oldest_article = 7
    max_articles_per_feed = 100
    recursions = 0
    no_stylesheets = True
    remove_javascript = True

    keep_only_tags = [dict(name=['p','img'])]
	
    preprocess_regexps = [ 
        (re.compile('\r'),lambda match: ''),
        (re.compile(r'<head.*?<title>', re.DOTALL|re.IGNORECASE), lambda match: '<head><title>'),
        (re.compile(r'</title>.*?</head>', re.DOTALL|re.IGNORECASE), lambda match: '</title></head>'),
        (re.compile(r'<body.*?<div class="pad_10L10R">', re.DOTALL|re.IGNORECASE), lambda match: '<body><div>'),
        (re.compile(r'</div>.*</body>', re.DOTALL|re.IGNORECASE), lambda match: '</div></body>'),
    ]

    feeds       = [(u'Articles', u'feed://www.prospect.org/articles_rss.jsp')]

Last edited by mwheinz; 05-17-2010 at 07:44 PM.
mwheinz is offline