View Single Post
Old 10-31-2014, 05:37 PM   #4
ireadtheinternet
Member
ireadtheinternet began at the beginning.
 
Posts: 21
Karma: 10
Join Date: Oct 2014
Device: Android
Here is what I ended up with so far, just to give an idea.

Code:
    def preprocess_html(self, soup):
        IMDB_BASE = 'http://www.imdb.com'
        
        truncated_summary = soup.find('p', attrs={'itemprop': ['description']})
        link_to_full_summary = truncated_summary.find('a')
        if link_to_full_summary is not None:
            full_summary_soup = self.index_to_soup(IMDB_BASE + link_to_full_summary['href'])
            full_plot_summary = full_summary_soup.find('p', attrs={'class': ['plotSummary']})
            truncated_summary.replaceWith(full_plot_summary)
            
        return soup
I will post the whole recipe when done.

EDITED: 11/4 - Today I learned this same full summary is on the same page below the truncated summary and the credits, so I just took this bit of code out of the recipe since it is not needed.

Last edited by ireadtheinternet; 11-04-2014 at 01:03 AM.
ireadtheinternet is offline   Reply With Quote