Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes


Thread Tools Search this Thread
Old 06-30-2011, 07:22 PM   #1
Junior Member
luczak began at the beginning.
Posts: 2
Karma: 10
Join Date: Jun 2011
Device: All New Nook Touch
(broken recipe) not working

The recipe is not working correctly. it creates a proper section menu with titles of articles and their summaries, but the actual articles are just blank.

Here is the current code for refrence:
from import BasicNewsRecipe
import re

class Cracked(BasicNewsRecipe):
    title                 = u''
    __author__            = u'Nudgenudge'
    language              = 'en'
    description            = 'America''s Only Humor and Video Site, since 1958'
    publisher             = 'Cracked'
    category              = 'comedy, lists'
    oldest_article        = 2
    delay                 = 10
    max_articles_per_feed = 2
    no_stylesheets        = True
    encoding              = 'cp1252'
    remove_javascript     = True
    use_embedded_content  = False
    INDEX                 = u''
    extra_css             = """
                                .pageheader_type{font-size: x-large; font-weight: bold; color: #828D74}
                                .pageheader_title{font-size: xx-large; color: #394128}
                                .pageheader_byline{font-size: small; font-weight: bold; color: #394128}
                                .score_bg {display: inline; width: 100%; margin-bottom: 2em}
                                .score_column_1{ padding-left: 10px; font-size: small; width: 50%}
                                .score_column_2{ padding-left: 10px; font-size: small; width: 50%}
                                .score_column_3{ padding-left: 10px; font-size: small; width: 50%}
                                .score_header{font-size: large; color: #50544A}
                                .bodytext{display: block}
                                body{font-family: Helvetica,Arial,sans-serif}

    conversion_options = {
                          'comment'   : description
                        , 'tags'      : category
                        , 'publisher' : publisher
                        , 'language'  : language
                        , 'linearize_tables' : True

    keep_only_tags    =  [
                        dict(name='div', attrs={'class':['Column1']})

    feeds = [(u'Articles', u'')]

    def get_article_url(self, article):
        return article.get('guid',  None)

    def cleanup_page(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        for alink in soup.findAll('a'):
            if alink.string is not None:
                tstr = alink.string
        for div_to_remove in soup.findAll('div', attrs={'id':['googlead_1','fb-like-article','comments_section']}):
        for div_to_remove in soup.findAll('div', attrs={'class':['share_buttons_col_1','GenericModule1']}):
        for div_to_remove in soup.findAll('div', attrs={'class':re.compile("prev_next")}):
        for ul_to_remove in soup.findAll('ul', attrs={'class':['Nav6']}):
        for image in soup.findAll('img', attrs={'alt': 'article image'}):

    def append_page(self, soup, appendtag, position):
        pager = soup.find('a',attrs={'class':'next_arrow_active'})
        if pager:
            nexturl = self.INDEX + pager['href']
            soup2 = self.index_to_soup(nexturl)
            texttag = soup2.find('div', attrs={'class':re.compile("userStyled")})
            newpos = len(texttag.contents)

    def preprocess_html(self, soup):
        self.append_page(soup, soup.body, 3)
        return self.adeify_images(soup)
luczak is offline   Reply With Quote
Old 07-04-2011, 12:10 AM   #2
Junior Member
UnWeave began at the beginning.
UnWeave's Avatar
Posts: 4
Karma: 12
Join Date: Jun 2011
Device: none
I eventually managed to hack something together for Cracked, which works (for now):

from import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup

class Cracked(BasicNewsRecipe):
    title                 = u''
    language              = 'en'
    description           = "America's Only HumorSite since 1958"
    publisher             = 'Cracked'
    category              = 'comedy, lists'
    oldest_article        = 3 #days
    max_articles_per_feed = 100
    no_stylesheets        = True
    encoding              = 'ascii'
    remove_javascript     = True
    use_embedded_content  = False

    feeds = [ (u'Articles', u'') ]

    conversion_options = {
                          'comment'   : description
                        , 'tags'      : category
                        , 'publisher' : publisher
                        , 'language'  : language

    remove_tags_before = dict(id='PrimaryContent')
    remove_tags_after = dict(name='div', attrs={'class':'shareBar'})
    remove_tags = [ dict(name='div', attrs={'class':['social',

                    dict(name='div', attrs={'id':['inline-share-buttons',

                    dict(name='span', attrs={'class':['views',
    def appendPage(self, soup, appendTag, position):
        # Check if article has multiple pages
        pageNav = soup.find('nav', attrs={'class':'PaginationContent'})
        if pageNav:
            # Check not at last page
            nextPage = pageNav.find('a', attrs={'class':'next'})
            if nextPage:
                nextPageURL = nextPage['href']
                nextPageSoup = self.index_to_soup(nextPageURL)
                # 8th <section> tag contains article content
                nextPageContent = nextPageSoup.findAll('section')[7]
                newPosition = len(nextPageContent.contents)

    def preprocess_html(self, soup):
        self.appendPage(soup, soup.body, 3)
        return soup

With all the images in the articles I find it makes for a file of around 4MB, so you may want to change oldest_article to 2 instead. You can also remove the # in front of dict(name=('img')) to remove all the images. You get a way smaller files size and (on my kindle) the next page loads quicker, but you'll obviously be missing some content, plus the captions will still be there.

I haven't applied any extra formatting to it, and I haven't tried it for every kind of article, though it will properly stitch together their 2-pagers.

If you find any problems with it let me know.
UnWeave is offline   Reply With Quote

broken, cracked,, recipe

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Source Broken] marcellolins Recipes 1 05-23-2011 10:35 AM
BigOven recipe broken kenr276 Recipes 5 04-18-2011 12:41 PM
volkskrant.recipe broken m.tarenskeen Recipes 9 01-01-2011 12:18 PM
Engadget Recipe Broken pars_andy Calibre 1 12-01-2009 11:39 PM
Economist Recipe - broken? dieterpops Calibre 1 02-20-2009 10:14 PM

All times are GMT -4. The time now is 10:31 PM. is a privately owned, operated and funded community.