View Single Post
Old 05-17-2010, 06:10 PM   #1929
mwheinz
award-winning bozo
mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.
 
Posts: 253
Karma: 172703
Join Date: Sep 2009
Location: Philadelphia
Device: Sony PRS-600
American Prospect Recipe

American Prospect Recipe

sdow1 - try this recipe. It's very simple, strips out all formatting at the moment.

Code:
import re

class AdvancedUserRecipe1273850169(BasicNewsRecipe):
    title          = u'American Prospect'
    oldest_article = 7
    max_articles_per_feed = 100
    recursions = 0
    no_stylesheets = True
    remove_javascript = True

    keep_only_tags = [dict(name=['p','img'])]
	
    preprocess_regexps = [ 
        (re.compile('\r'),lambda match: ''),
        (re.compile(r'<head.*?<title>', re.DOTALL|re.IGNORECASE), lambda match: '<head><title>'),
        (re.compile(r'</title>.*?</head>', re.DOTALL|re.IGNORECASE), lambda match: '</title></head>'),
        (re.compile(r'<body.*?<div class="pad_10L10R">', re.DOTALL|re.IGNORECASE), lambda match: '<body><div>'),
        (re.compile(r'</div>.*</body>', re.DOTALL|re.IGNORECASE), lambda match: '</div></body>'),
    ]

    feeds       = [(u'Articles', u'feed://www.prospect.org/articles_rss.jsp')]

Last edited by mwheinz; 05-17-2010 at 08:44 PM.
mwheinz is offline