|
|
#1 |
|
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Sep 2012
Device: 1st Gen Nook, Kindle PW
|
Television Without Pity
Would love a recipe that would fetch recaps from http://twop.com. I hope someone can help!
Thanks Jackie |
|
|
|
|
|
#2 |
|
Vox calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
|
will be available in next calibre release. i sent it to kovid.
|
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Connoisseur
![]() Posts: 55
Karma: 10
Join Date: Apr 2010
Location: new york city
Device: nook, ipad
|
Just a note - the recipe is only pulling the first page of the full recaps.
|
|
|
|
|
|
#4 |
|
Vox calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
|
can you send me a link to one of the multipage articles?
|
|
|
|
|
|
#5 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,841
Karma: 4985051
Join Date: Sep 2010
Location: Maryland
Device: Kindle
|
|
|
|
|
| Advert | |
|
|
|
|
#6 |
|
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Oct 2013
Device: PookPort Plus
|
TWoP Recap Recipe
Not sure if you're still interested in this recipe, but I'm a huge fan of TWoP and I've been looking for a way to capture their recaps in a format like this, so I ended up writing one. This will grab all pages from a multipage recap and make them into a single article.
The following code can be used in a few ways: 1) Grab the latest recaps for all active shows from the RSS feed. This is configured as the default 2) You can also grab the latest recaps from a specific show by adding its RSS feed to the feeds list. http://www.televisionwithoutpity.com...W-NAME/rss.xml is the usual format. 3) By making a couple small modifications, you can instead pull down the entire collection of a show's recaps. I did this with parse_index because the individual show feeds don't contain links to all episodes. If you do this, I would recommend uncommenting reverse_article_sort as well so you get the recaps in show order. Code:
from calibre.web.feeds.news import BasicNewsRecipe
from BeautifulSoup import Tag
class TelevisionWithoutPity(BasicNewsRecipe):
title = u'Television Without Pity'
language = 'en'
__author__ = 'Snarkastica'
SHOW = 'http://www.televisionwithoutpity.com/show/SHOW-NAME-HERE/recaps/' # Used for pulling down an entire show, not just the RSS feed
oldest_article = 7 #days
max_articles_per_feed = 25
#reverse_article_order=True # Useful for entire show, to display in episode order
#encoding = 'cp1252'
use_embedded_content = False
preprocess_regexps = [(re.compile(r'<span class="headline_recap_title .*?>', re.DOTALL|re.IGNORECASE), lambda match: '<span class="headline_recap_title">')]
keep_only_tags = [dict(name='span', attrs={'class':'headline_recap_title'}), dict(name='p', attrs={'class':'byline'}), dict(name='div', attrs={'class':'body_recap'}), dict(name='h1')]
no_stylesheets = True
# Comment this out and configure process_index() to retrieve a single show
feeds = [
('Ltest Recaps',
'http://www.televisionwithoutpity.com/rss.xml'),
]
'''
This method can be used to grab all recaps for a single show
Set the SHOW constant at the beginning of this file to the URL for a show's recap page
(the page listing all recaps, usually of the form:
http://www.televisionwithoutpity.com/show/SHOW-NAME/recaps/"
Where SHOW-NAME is the hyphenated name of the show.
To use:
1. Comment out feeds = [...] earlier in this file
2. Set the SHOW constant to the show's recap page
3. Uncomment the following function
'''
'''
def parse_index(self):
soup = self.index_to_soup(self.SHOW)
feeds = []
articles = []
showTitle = soup.find('h1').string
recaps = soup.find('table')
for ep in recaps.findAll('tr'):
epData = ep.findAll('td')
epNum = epData[0].find(text=True).strip()
if not epNum == "Ep.":
epT = self.tag_to_string(epData[1].find('em')).strip()
epST = " (or " + self.tag_to_string(epData[1].find('h3')).strip() + ")"
epTitle = epNum + ": " + epT + epST
epData[1].find('em').extract()
epURL = epData[1].find('a', href=True)
epURL = epURL['href']
epSum = self.tag_to_string(epData[1].find('p')).strip()
epDate = epData[2].find(text=True).strip()
epAuthor = self.tag_to_string(epData[4].find('p')).strip()
articles.append({'title':epTitle, 'url':epURL, 'description':epSum, 'date':epDate, 'author':epAuthor})
feeds.append((showTitle, articles))
#self.abort_recipe_processing("test")
return feeds
'''
# This will add subsequent pages of multipage recaps to a single article page
def append_page(self, soup, appendtag, position):
if (soup.find('p',attrs={'class':'pages'})): # If false, will still grab single-page recaplets
pager = soup.find('p',attrs={'class':'pages'}).find(text='Next')
if pager:
nexturl = pager.parent['href']
soup2 = self.index_to_soup(nexturl)
texttag = soup2.find('div', attrs={'class':'body_recap'})
for it in texttag.findAll(style=True):
del it['style']
newpos = len(texttag.contents)
self.append_page(soup2,texttag,newpos)
texttag.extract()
appendtag.insert(position,texttag)
def preprocess_html(self, soup):
self.append_page(soup, soup.body, 3)
return soup
# Remove the multi page links (we had to keep these in for append_page(), but they can go away now
# Could have used CSS to hide, but some readers ignore CSS.
def postprocess_html(self, soup, first_fetch):
print ("entering post")
paginator = soup.findAll('p', attrs={'class':'pages'})
if paginator:
for p in paginator:
p.extract()
# TODO: Fix this so it converts the headline class into a heading 1
#titleTag = Tag(soup, "h1")
#repTag = soup.find('span', attrs={'class':'headline_recap_title'})
#titleTag.insert(0, repTag.contents[0])
#repTag.extract()
#soup.body.insert(1, titleTag)
return soup
If anyone has suggestions, I'm happy to learn.There are a couple of TODOs for the next version: specifically changing the episode headline into a heading 1, and making it so when you pull an entire show, each season becomes a section, instead of having all episodes in one. Hope this works for you. let me know if you have questions. |
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| New E-Reader Television Commercials! | Ebooks4ever | General Discussions | 26 | 12-07-2010 04:45 PM |
| The Pity Karma for PilotBob thread | desertgrandma | Lounge | 82 | 10-14-2010 01:14 AM |
| Please have pity on an ignoramus (very long) | ziegl027 | Which one should I buy? | 17 | 10-12-2009 11:58 AM |
| The dangers of television | HarryT | Lounge | 3 | 01-03-2009 12:19 PM |
| Television scripts for your handheld | Bob Russell | Lounge | 2 | 01-07-2006 07:22 PM |