Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 10-03-2012, 11:17 PM   #1
xoxojackie
Junior Member
xoxojackie began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Sep 2012
Device: 1st Gen Nook, Kindle PW
Television Without Pity

Would love a recipe that would fetch recaps from http://twop.com. I hope someone can help!

Thanks
Jackie
xoxojackie is offline   Reply With Quote
Old 10-05-2012, 01:23 PM   #2
Krittika Goyal
Vox calibre
Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.
 
Krittika Goyal's Avatar
 
Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
will be available in next calibre release. i sent it to kovid.
Krittika Goyal is offline   Reply With Quote
Advert
Old 10-11-2012, 06:56 AM   #3
sdow1
Connoisseur
sdow1 began at the beginning.
 
Posts: 55
Karma: 10
Join Date: Apr 2010
Location: new york city
Device: nook, ipad
Just a note - the recipe is only pulling the first page of the full recaps.
sdow1 is offline   Reply With Quote
Old 10-12-2012, 11:57 AM   #4
Krittika Goyal
Vox calibre
Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.
 
Krittika Goyal's Avatar
 
Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
can you send me a link to one of the multipage articles?
Krittika Goyal is offline   Reply With Quote
Old 10-13-2012, 12:28 PM   #5
copyrite
Wizard
copyrite ought to be getting tired of karma fortunes by now.copyrite ought to be getting tired of karma fortunes by now.copyrite ought to be getting tired of karma fortunes by now.copyrite ought to be getting tired of karma fortunes by now.copyrite ought to be getting tired of karma fortunes by now.copyrite ought to be getting tired of karma fortunes by now.copyrite ought to be getting tired of karma fortunes by now.copyrite ought to be getting tired of karma fortunes by now.copyrite ought to be getting tired of karma fortunes by now.copyrite ought to be getting tired of karma fortunes by now.copyrite ought to be getting tired of karma fortunes by now.
 
copyrite's Avatar
 
Posts: 1,814
Karma: 4985051
Join Date: Sep 2010
Location: Maryland
Device: ...lots! ;) mostly reading on a Kindle Voyage
Here's an example:

http://www.televisionwithoutpity.com...are-both-1.php
copyrite is offline   Reply With Quote
Advert
Old 10-13-2013, 08:25 PM   #6
Snarkastica
Junior Member
Snarkastica began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Oct 2013
Device: PookPort Plus
TWoP Recap Recipe

Not sure if you're still interested in this recipe, but I'm a huge fan of TWoP and I've been looking for a way to capture their recaps in a format like this, so I ended up writing one. This will grab all pages from a multipage recap and make them into a single article.

The following code can be used in a few ways:

1) Grab the latest recaps for all active shows from the RSS feed. This is configured as the default

2) You can also grab the latest recaps from a specific show by adding its RSS feed to the feeds list. http://www.televisionwithoutpity.com...W-NAME/rss.xml is the usual format.

3) By making a couple small modifications, you can instead pull down the entire collection of a show's recaps. I did this with parse_index because the individual show feeds don't contain links to all episodes. If you do this, I would recommend uncommenting reverse_article_sort as well so you get the recaps in show order.

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from BeautifulSoup import Tag

class TelevisionWithoutPity(BasicNewsRecipe):
    title          = u'Television Without Pity'
    language       = 'en'
    __author__     = 'Snarkastica'
    SHOW = 'http://www.televisionwithoutpity.com/show/SHOW-NAME-HERE/recaps/' # Used for pulling down an entire show, not just the RSS feed
    oldest_article = 7 #days
    max_articles_per_feed = 25
    #reverse_article_order=True # Useful for entire show, to display in episode order
    #encoding = 'cp1252'
    use_embedded_content = False

    preprocess_regexps = [(re.compile(r'<span class="headline_recap_title .*?>', re.DOTALL|re.IGNORECASE), lambda match: '<span class="headline_recap_title">')]
    keep_only_tags = [dict(name='span', attrs={'class':'headline_recap_title'}), dict(name='p', attrs={'class':'byline'}), dict(name='div', attrs={'class':'body_recap'}), dict(name='h1')]
    no_stylesheets = True

    # Comment this out and configure process_index() to retrieve a single show
    feeds          = [
('Ltest Recaps',
 'http://www.televisionwithoutpity.com/rss.xml'),
]

    '''
    This method can be used to grab all recaps for a single show
    Set the SHOW constant at the beginning of this file to the URL for a show's recap page
    (the page listing all recaps, usually of the form:
    http://www.televisionwithoutpity.com/show/SHOW-NAME/recaps/"
    Where SHOW-NAME is the hyphenated name of the show.
    
    To use:
    1. Comment out feeds = [...] earlier in this file
    2. Set the SHOW constant to the show's recap page
    3. Uncomment the following function
    '''

    '''
    def parse_index(self):
        soup = self.index_to_soup(self.SHOW)
        feeds = []
        articles = []
        showTitle = soup.find('h1').string
        recaps = soup.find('table')
        for ep in recaps.findAll('tr'):
            epData = ep.findAll('td')
            epNum = epData[0].find(text=True).strip()
            if not epNum == "Ep.":
                epT = self.tag_to_string(epData[1].find('em')).strip()
                epST = " (or " + self.tag_to_string(epData[1].find('h3')).strip() + ")"
                epTitle = epNum + ": " + epT + epST
                epData[1].find('em').extract()
                epURL = epData[1].find('a', href=True)
                epURL = epURL['href']
                epSum = self.tag_to_string(epData[1].find('p')).strip()
                epDate = epData[2].find(text=True).strip()
                epAuthor = self.tag_to_string(epData[4].find('p')).strip()
                articles.append({'title':epTitle, 'url':epURL, 'description':epSum, 'date':epDate, 'author':epAuthor})
        feeds.append((showTitle, articles))
        #self.abort_recipe_processing("test")
        return feeds
    '''

    # This will add subsequent pages of multipage recaps to a single article page
    def append_page(self, soup, appendtag, position):
        if (soup.find('p',attrs={'class':'pages'})): # If false, will still grab single-page recaplets
            pager = soup.find('p',attrs={'class':'pages'}).find(text='Next')
            if pager:
                nexturl = pager.parent['href']
                soup2 = self.index_to_soup(nexturl)
                texttag = soup2.find('div', attrs={'class':'body_recap'})
                for it in texttag.findAll(style=True):
                    del it['style']
                newpos = len(texttag.contents)          
                self.append_page(soup2,texttag,newpos)
                texttag.extract()
                appendtag.insert(position,texttag)

    def preprocess_html(self, soup):
        self.append_page(soup, soup.body, 3)
        return soup

    # Remove the multi page links (we had to keep these in for append_page(), but they can go away now
    # Could have used CSS to hide, but some readers ignore CSS.
    def postprocess_html(self, soup, first_fetch):
        print ("entering post")
        paginator = soup.findAll('p', attrs={'class':'pages'})
        if paginator:
            for p in paginator:
                p.extract()
                
                # TODO: Fix this so it converts the headline class into a heading 1
        #titleTag = Tag(soup, "h1")
        #repTag = soup.find('span', attrs={'class':'headline_recap_title'})
        #titleTag.insert(0, repTag.contents[0])
        #repTag.extract()
        #soup.body.insert(1, titleTag)
        return soup
This is the first recipe I've done, so maybe there are a few things I could do differently, but it worked for my purposes. If anyone has suggestions, I'm happy to learn.

There are a couple of TODOs for the next version: specifically changing the episode headline into a heading 1, and making it so when you pull an entire show, each season becomes a section, instead of having all episodes in one.

Hope this works for you. let me know if you have questions.
Snarkastica is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
New E-Reader Television Commercials! Ebooks4ever General Discussions 26 12-07-2010 03:45 PM
The Pity Karma for PilotBob thread desertgrandma Lounge 82 10-14-2010 12:14 AM
Please have pity on an ignoramus (very long) ziegl027 Which one should I buy? 17 10-12-2009 10:58 AM
The dangers of television HarryT Lounge 3 01-03-2009 11:19 AM
Television scripts for your handheld Bob Russell Lounge 2 01-07-2006 06:22 PM


All times are GMT -4. The time now is 08:18 AM.


MobileRead.com is a privately owned, operated and funded community.