View Single Post
Old 07-02-2010, 07:29 PM   #2232
schnortz
Junior Member
schnortz began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jul 2010
Device: Nook
Starson17... thanks for responding.

The recipe I am using is the following (modified with your suggested change, even if it was unsuccessful). Hope I'm not violating etiquette by posting the code.

Quote:
import string, re

#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '2009 Kovid Goyal <kovid at kovidgoyal.net>'

from calibre.web.feeds.news import BasicNewsRecipe

class AppletonPostCrescent(BasicNewsRecipe):
title = u'Appleton Post Crescent'
oldest_article = 2
language = 'en'

__author__ = 'Joseph Kitzmiller and Sujata Raman'
max_articles_per_feed = 25
no_stylesheets = True
use_embedded_content = False
remove_javascript = True
encoding = 'cp1252'
cover_url = u'http://www.postcrescent.com/ic/assets/frontpage.pdf'
publisher = 'Appleton Post Crescent, Gannett'
category = 'news, Appleton, Fox Cities, Wisconsin'

extra_css = '''
h1{font-family:Arial,Helvetica,sans-serif; font-size:large; color:#0E5398; }
h2{color:#666666;}
.blog_title{color:#4E0000; font-family:Georgia,"Times New Roman",Times,serif; font-size:large;}
.sidebar-photo{font-family:Arial,Helvetica,sans-serif; color:#333333; font-size:30%;}
.blog_post{font-family:Arial,Helvetica,sans-serif; color:#222222; font-size:xx-small;}
.article-bodytext{font-family:Arial,Helvetica,sans-serif; font-size:xx-small; color:#222222;font-weight:normal;}
.ratingbyline{font-family:Arial,Helvetica,sans-serif; color:#333333; font-size:50%;}
.author{font-family:Arial,Helvetica,sans-serif; color:#777777; font-size:50%;}
.date{font-family:Arial,Helvetica,sans-serif; color:#777777; font-size:50%;}
.padding{font-family:Arial,Helvetica,sans-serif; font-size:70%; color:#222222; font-weight:normal;}
'''

preprocess_regexps = [
(re.compile(r'<p></p><div.*</div>', re.IGNORECASE | re.DOTALL), lambda match : r''),
]

keep_only_tags = [dict(name='div', attrs={'class':['padding','sidebar-photo']})]

remove_tags = [ dict(name=['object','link','table','embed','script', 'noscript'])
,dict(name='div',attrs={'id':["pluckcomments","StoryChat"]})
,dict(name='div',attrs={'class':['article-tools',"padding article-sidebar",'articleflex-container','poster-container','newslist','footer-container','sidebar-related','sub']})
,dict(name='p',attrs={'class':['posted','tags']})]

feeds = [(u'Breaking News', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSbreaking.pbs&mime=xml'),
(u'Latest Headlines', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSlatest.pbs&mime=xml'),
(u'Local News', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSlocal.pbs&mime=xml'),
(u'Sports', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSsports.pbs&mime=xml'),
(u'Buzz Blog', u'http://sitelife.postcrescent.com/ver1.0/Blog/BlogRss?plckBlogId=Blog:9a8980f0-f726-439c-8c4e-1dc0f788941e'),
(u'Weekend Blog', u'http://sitelife.postcrescent.com/ver1.0/Blog/BlogRss?plckBlogId=Blog:9dbf4deb-0468-41b7-a0c7-3a777c03d64c')]


def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
for item in soup.findAll(face=True):
del item['face']
return soup
And as requested... here is a link to an artichttps://www.mobileread.com/forums/newreply.php?do=newreply&noquote=1&p=989970le that has the "Additional Information box"... http://www.postcrescent.com/article/...AA&located=rss

And yes, I meant articles. Here is their Local News RSS Feed... http://www.postcrescent.com/apps/pbc...l.pbs&mime=xml As of now, there were a couple of "Photos: ..."

Thanks in advance.
[/LIST]
schnortz is offline