View Single Post
Old 09-05-2010, 11:01 PM   #2651
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by TonytheBookworm View Post
I know how you enjoy the food recipe recipes so here is one you might enjoy.
The food recipes are for the wife - I'll pass it along. Thanks!

Quote:
You might wanna modify the formatting a little to get rid of the two || (i can't figure out how to do it even with a findall. And also the little thumbnail gets put next to the start of the words where a <br> would be better after the image (another thing i'm not sure how to do)..
This version deals with both issues:
Spoiler:

BUCKMASTERS RECIPES
Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, Tag
import re

class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = 'BuckMasters In The Kitchen'
    language = 'en'
    __author__ = 'TonytheBookworm & Starson17'
    description = 'Learn how to cook all those outdoor varments'
    publisher = 'BuckMasters.com'
    category = 'food,cooking,recipes'
    oldest_article = 365
    max_articles_per_feed = 100
    conversion_options = {'linearize_tables' : True}
    masthead_url = 'http://www.buckmasters.com/Portals/_default/Skins/BM_10/images/header_bg.jpg'
    keep_only_tags    = [
                         dict(name='table', attrs={'class':['containermaster_black']})
                        ]
    remove_tags_after = [dict(name='div', attrs={'align':['left']})]
    feeds          = [
                      ('Recipes', 'http://www.buckmasters.com/DesktopModules/DnnForge%20-%20NewsArticles/RSS.aspx?TabID=292&ModuleID=658&MaxCount=25'),
                    ]

    def preprocess_html(self, soup):
        item = soup.find('a', attrs={'class':['MenuTopSelected']})
        if item:
            item.parent.extract()
        for img_tag in soup.findAll('img'):
            parent_tag = img_tag.parent
            if parent_tag.name == 'a':
                new_tag = Tag(soup,'p')
                new_tag.insert(0,img_tag)
                parent_tag.replaceWith(new_tag)
            elif parent_tag.name == 'p':
                if not self.tag_to_string(parent_tag) == '':
                    new_div = Tag(soup,'div')
                    new_tag = Tag(soup,'p')
                    new_tag.insert(0,img_tag)
                    parent_tag.replaceWith(new_div)
                    new_div.insert(0,new_tag)
                    new_div.insert(1,parent_tag)
        return soup
Starson17 is offline