10-03-2012, 11:17 PM | #1 |
Junior Member
Posts: 3
Karma: 10
Join Date: Sep 2012
Device: 1st Gen Nook, Kindle PW
|
Television Without Pity
Would love a recipe that would fetch recaps from http://twop.com. I hope someone can help!
Thanks Jackie |
10-05-2012, 01:23 PM | #2 |
Vox calibre
Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
|
will be available in next calibre release. i sent it to kovid.
|
10-11-2012, 06:56 AM | #3 |
Connoisseur
Posts: 55
Karma: 10
Join Date: Apr 2010
Location: new york city
Device: nook, ipad
|
Just a note - the recipe is only pulling the first page of the full recaps.
|
10-12-2012, 11:57 AM | #4 |
Vox calibre
Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
|
can you send me a link to one of the multipage articles?
|
10-13-2012, 12:28 PM | #5 |
Wizard
Posts: 1,814
Karma: 4985051
Join Date: Sep 2010
Location: Maryland
Device: ...lots! ;) mostly reading on a Kindle Voyage
|
|
10-13-2013, 08:25 PM | #6 |
Junior Member
Posts: 1
Karma: 10
Join Date: Oct 2013
Device: PookPort Plus
|
TWoP Recap Recipe
Not sure if you're still interested in this recipe, but I'm a huge fan of TWoP and I've been looking for a way to capture their recaps in a format like this, so I ended up writing one. This will grab all pages from a multipage recap and make them into a single article.
The following code can be used in a few ways: 1) Grab the latest recaps for all active shows from the RSS feed. This is configured as the default 2) You can also grab the latest recaps from a specific show by adding its RSS feed to the feeds list. http://www.televisionwithoutpity.com...W-NAME/rss.xml is the usual format. 3) By making a couple small modifications, you can instead pull down the entire collection of a show's recaps. I did this with parse_index because the individual show feeds don't contain links to all episodes. If you do this, I would recommend uncommenting reverse_article_sort as well so you get the recaps in show order. Code:
from calibre.web.feeds.news import BasicNewsRecipe from BeautifulSoup import Tag class TelevisionWithoutPity(BasicNewsRecipe): title = u'Television Without Pity' language = 'en' __author__ = 'Snarkastica' SHOW = 'http://www.televisionwithoutpity.com/show/SHOW-NAME-HERE/recaps/' # Used for pulling down an entire show, not just the RSS feed oldest_article = 7 #days max_articles_per_feed = 25 #reverse_article_order=True # Useful for entire show, to display in episode order #encoding = 'cp1252' use_embedded_content = False preprocess_regexps = [(re.compile(r'<span class="headline_recap_title .*?>', re.DOTALL|re.IGNORECASE), lambda match: '<span class="headline_recap_title">')] keep_only_tags = [dict(name='span', attrs={'class':'headline_recap_title'}), dict(name='p', attrs={'class':'byline'}), dict(name='div', attrs={'class':'body_recap'}), dict(name='h1')] no_stylesheets = True # Comment this out and configure process_index() to retrieve a single show feeds = [ ('Ltest Recaps', 'http://www.televisionwithoutpity.com/rss.xml'), ] ''' This method can be used to grab all recaps for a single show Set the SHOW constant at the beginning of this file to the URL for a show's recap page (the page listing all recaps, usually of the form: http://www.televisionwithoutpity.com/show/SHOW-NAME/recaps/" Where SHOW-NAME is the hyphenated name of the show. To use: 1. Comment out feeds = [...] earlier in this file 2. Set the SHOW constant to the show's recap page 3. Uncomment the following function ''' ''' def parse_index(self): soup = self.index_to_soup(self.SHOW) feeds = [] articles = [] showTitle = soup.find('h1').string recaps = soup.find('table') for ep in recaps.findAll('tr'): epData = ep.findAll('td') epNum = epData[0].find(text=True).strip() if not epNum == "Ep.": epT = self.tag_to_string(epData[1].find('em')).strip() epST = " (or " + self.tag_to_string(epData[1].find('h3')).strip() + ")" epTitle = epNum + ": " + epT + epST epData[1].find('em').extract() epURL = epData[1].find('a', href=True) epURL = epURL['href'] epSum = self.tag_to_string(epData[1].find('p')).strip() epDate = epData[2].find(text=True).strip() epAuthor = self.tag_to_string(epData[4].find('p')).strip() articles.append({'title':epTitle, 'url':epURL, 'description':epSum, 'date':epDate, 'author':epAuthor}) feeds.append((showTitle, articles)) #self.abort_recipe_processing("test") return feeds ''' # This will add subsequent pages of multipage recaps to a single article page def append_page(self, soup, appendtag, position): if (soup.find('p',attrs={'class':'pages'})): # If false, will still grab single-page recaplets pager = soup.find('p',attrs={'class':'pages'}).find(text='Next') if pager: nexturl = pager.parent['href'] soup2 = self.index_to_soup(nexturl) texttag = soup2.find('div', attrs={'class':'body_recap'}) for it in texttag.findAll(style=True): del it['style'] newpos = len(texttag.contents) self.append_page(soup2,texttag,newpos) texttag.extract() appendtag.insert(position,texttag) def preprocess_html(self, soup): self.append_page(soup, soup.body, 3) return soup # Remove the multi page links (we had to keep these in for append_page(), but they can go away now # Could have used CSS to hide, but some readers ignore CSS. def postprocess_html(self, soup, first_fetch): print ("entering post") paginator = soup.findAll('p', attrs={'class':'pages'}) if paginator: for p in paginator: p.extract() # TODO: Fix this so it converts the headline class into a heading 1 #titleTag = Tag(soup, "h1") #repTag = soup.find('span', attrs={'class':'headline_recap_title'}) #titleTag.insert(0, repTag.contents[0]) #repTag.extract() #soup.body.insert(1, titleTag) return soup There are a couple of TODOs for the next version: specifically changing the episode headline into a heading 1, and making it so when you pull an entire show, each season becomes a section, instead of having all episodes in one. Hope this works for you. let me know if you have questions. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
New E-Reader Television Commercials! | Ebooks4ever | General Discussions | 26 | 12-07-2010 03:45 PM |
The Pity Karma for PilotBob thread | desertgrandma | Lounge | 82 | 10-14-2010 12:14 AM |
Please have pity on an ignoramus (very long) | ziegl027 | Which one should I buy? | 17 | 10-12-2009 10:58 AM |
The dangers of television | HarryT | Lounge | 3 | 01-03-2009 11:19 AM |
Television scripts for your handheld | Bob Russell | Lounge | 2 | 01-07-2006 06:22 PM |