Thanks, is there a reason the "print" statements don't show up in the command line from within a recipe? when I do a print "hello world" from elsewhere in the application it prints to the command line in windows (when calling with calibre-debug -g). Or is there a method for writing to a log file? I just have some strange things happening and it would be helpful to have a method to see what is happening with the text.
also, is there a way to see the mobi metadata (the summaries for each article) without having to copy them to my kindle each time? mobi readers will show the metadata for the whole book, but I don't see anything that does it for every article within the file.
Quote:
Originally Posted by GRiker
This populate_article_metadata() function was once in the NYTimes recipe, but was removed at some point. You can use it as a point of reference:
Spoiler:
Code:
def populate_article_metadata(self,article,soup,first):
'''
Extract author and description from article, add to article metadata
'''
def extract_author(soup):
byline = soup.find('meta',attrs={'name':['byl','CLMST']})
if byline :
author = byline['content']
else :
# Try for <div class="byline">
byline = soup.find('div', attrs={'class':'byline'})
if byline:
author = byline.renderContents()
else:
print soup.prettify()
return None
return author
def extract_description(soup):
description = soup.find('meta',attrs={'name':['description','description ']})
if description :
return self.massageNCXText(description['content'])
else:
# Take first paragraph of article
articlebody = soup.find('div',attrs={'id':'articlebody'})
if not articlebody:
# Try again with class instead of id
articlebody = soup.find('div',attrs={'class':'articlebody'})
if not articlebody:
print 'postprocess_book.extract_description(): Did not find <div id="articlebody">:'
print soup.prettify()
return None
paras = articlebody.findAll('p')
for p in paras:
if p.renderContents() > '' :
return self.massageNCXText(self.tag_to_string(p,use_alt=False))
return None
article.author = extract_author(soup)
article.summary = article.text_summary = extract_description(soup)
G
|