View Single Post
Old 01-10-2011, 03:15 PM   #6
bcollier
Member
bcollier began at the beginning.
 
bcollier's Avatar
 
Posts: 22
Karma: 10
Join Date: Jan 2011
Device: Kindle DX
Thanks, is there a reason the "print" statements don't show up in the command line from within a recipe? when I do a print "hello world" from elsewhere in the application it prints to the command line in windows (when calling with calibre-debug -g). Or is there a method for writing to a log file? I just have some strange things happening and it would be helpful to have a method to see what is happening with the text.

also, is there a way to see the mobi metadata (the summaries for each article) without having to copy them to my kindle each time? mobi readers will show the metadata for the whole book, but I don't see anything that does it for every article within the file.

Quote:
Originally Posted by GRiker View Post
This populate_article_metadata() function was once in the NYTimes recipe, but was removed at some point. You can use it as a point of reference:

Spoiler:
Code:
def populate_article_metadata(self,article,soup,first):
        '''
        Extract author and description from article, add to article metadata
        '''
        def extract_author(soup):
            byline = soup.find('meta',attrs={'name':['byl','CLMST']})
            if byline :
                author = byline['content']
            else :
                # Try for <div class="byline">
                byline = soup.find('div', attrs={'class':'byline'})
                if byline:
                    author = byline.renderContents()
                else:
                    print soup.prettify()
                    return None
            return author

        def extract_description(soup):
            description = soup.find('meta',attrs={'name':['description','description ']})
            if description :
                return self.massageNCXText(description['content'])
            else:
                # Take first paragraph of article
                articlebody = soup.find('div',attrs={'id':'articlebody'})
                if not articlebody:
                    # Try again with class instead of id
                    articlebody = soup.find('div',attrs={'class':'articlebody'})
                    if not articlebody:
                        print 'postprocess_book.extract_description(): Did not find <div id="articlebody">:'
                        print soup.prettify()
                        return None
                paras = articlebody.findAll('p')
                for p in paras:
                    if p.renderContents() > '' :
                        return self.massageNCXText(self.tag_to_string(p,use_alt=False))
                return None

        article.author = extract_author(soup)
        article.summary = article.text_summary = extract_description(soup)


G
bcollier is offline   Reply With Quote