View Single Post
Old 04-29-2013, 07:58 PM   #2
Camper65
Enthusiast
Camper65 began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Apr 2011
Device: Kindle wifi; Dell 2in1
Red face

Found part of the solution, at least now the documents are downloading, now to clean it up before it creates a ebook version. It needed a complete rewrite of the original recipe. Since it's a rewrite, I'm putting my info into it.

So far the code is as follows:

Code:
from calibre.web.feeds.news import BasicNewsRecipe

class dotnetMagazine (BasicNewsRecipe):
    __author__ = u'Bonni Salles'
    __version__ = '1.0'
    __license__   = 'GPL v3'
    __copyright__ = u'2013, Bonni Salles'
    title                 = '.net '
    oldest_article        = 7
    no_stylesheets        = True
    encoding              = 'utf8'
    use_embedded_content  = False
    language              = 'en'
    remove_empty_feeds    = True
    extra_css             = ' body{font-family: Arial,Helvetica,sans-serif } img{margin-bottom: 0.4em} '

#   remove_tags_above = dict(id='header')
#   remove_tags_below = [dict(name='footer')]

#   keep_only_tags = [
#         dict(name='article', attrs={'class': re.compile('^node.*$', re.IGNORECASE)}),
#         ]
#   remove_tags = [
#         dict(name='span', attrs={'class': 'comment-count'}),
#         dict(name='div', attrs={'class': 'item-list share-links'}),
#         dict(name='footer'),
#         ]
#   remove_attributes = ['border', 'cellspacing', 'align', 'cellpadding', 'colspan', 'valign', 'vspace', 'hspace', #'alt', 'width', 'height', 'style']
#   extra_css = 'img {max-width: 100%; display: block; margin: auto;} .captioned-image div {text-align: center; #font-style: italic;}'


    feeds = [
               (u'net', u'http://feeds.feedburner.com/net/topstories')
            ]
Now to read on how to remove tags before it processing the html, there's a lot on the page that is not needed. It took a week to figure out that the recipe needed the complete rewrite.
Camper65 is offline   Reply With Quote