New York Times Descriptions Not Working

bcollier · 01-07-2011, 12:35 PM

Hello,

This is my first time posting, but I hope to get more involved, love Calibre and hopefully I can contribute back somehow. A few things, but #1 is the most important.

1) One thing that could substantially improve the Calibre New York Times news subscription is having the article descriptions in the menu. The descriptions are in the menu for the Front Page articles, but not for any other articles (see what I'm talking about in the two pictures of my Kindle). It would be very easy to fill in the description with the first two sentences from the article in the case where the description is blank rather than leave it blank.

I spent several hours trying to customize the recipe to use the first two sentences but couldn't figure out how to get a hold of the text body.

in the handle_article function these two lines set the description from what I can tell:

description = ''
pubdate = strftime('%a, %d %b')
summary = div.find(True, attrs={'class':'summary'})
if summary:
description = self.tag_to_string(summary, use_alt=False)

How would we update these lines to parse out the first two lines from the article rather than the blank string?

(2) On the topic of the NYT, what is the best time of day to schedule the New York Times for download? I've been doing 6am, but at that time there are only 1 or 2 articles in the Front Page section, at 8am this morning I accidentally downloaded again, and noticed the front page section fill up with 7 articles. Has anyone experimented with this? I am experimenting now, downloading the web version every hour to see about what time the NYT's adds articles to these versions

(3) Also on the topic of the NYT for a long time they have been talking about a paywall for web content on the NYT (http://www.nytimes.com/2010/01/21/bu...a/21times.html). Has anyone heard if/when this is going into effect (this month?) and how that will effect the Calibre download?

I would love to help with Calibre development, I write in Python for work so it's no problem to learn, just need to learn the ins and outs of how the system works, and what improvements are needed.

kovidgoyal · 01-07-2011, 02:29 PM

1) Use the populate_article_metadata method.

3) It wont make any difference, calibre supports paywalled sites just fine, see WSJ for an example.

bcollier · 01-07-2011, 04:06 PM

Ok, thanks for the quick response. Where is the documentation for the article object being passed in? I'm just looking for the main article text and can't seem to get it in the populate_article_metadata. I have

if len(article.text_summary) == 0:
article.text_summary = "the first two sentences of the article"

should I somehow pull the main article text from soup, or is it already parsed in in the article object?

Quote:

Originally Posted by kovidgoyal

1) Use the populate_article_metadata method.

3) It wont make any difference, calibre supports paywalled sites just fine, see WSJ for an example.

kovidgoyal · 01-07-2011, 04:31 PM

look at feeds.__init__

You have to pull content from the soup

GRiker · 01-08-2011, 08:58 AM

Quote:

Originally Posted by bcollier

Ok, thanks for the quick response. Where is the documentation for the article object being passed in? I'm just looking for the main article text and can't seem to get it in the populate_article_metadata. I have

if len(article.text_summary) == 0:
article.text_summary = "the first two sentences of the article"

should I somehow pull the main article text from soup, or is it already parsed in in the article object?

This populate_article_metadata() function was once in the NYTimes recipe, but was removed at some point. You can use it as a point of reference:

Spoiler:

G

bcollier · 01-10-2011, 03:15 PM

Thanks, is there a reason the "print" statements don't show up in the command line from within a recipe? when I do a print "hello world" from elsewhere in the application it prints to the command line in windows (when calling with calibre-debug -g). Or is there a method for writing to a log file? I just have some strange things happening and it would be helpful to have a method to see what is happening with the text.

also, is there a way to see the mobi metadata (the summaries for each article) without having to copy them to my kindle each time? mobi readers will show the metadata for the whole book, but I don't see anything that does it for every article within the file.

Quote:

Originally Posted by GRiker

This populate_article_metadata() function was once in the NYTimes recipe, but was removed at some point. You can use it as a point of reference:

Spoiler:

G

GRiker · 01-10-2011, 04:04 PM

Quote:

Originally Posted by bcollier

Thanks, is there a reason the "print" statements don't show up in the command line from within a recipe? when I do a print "hello world" from elsewhere in the application it prints to the command line in windows (when calling with calibre-debug -g). Or is there a method for writing to a log file? I just have some strange things happening and it would be helpful to have a method to see what is happening with the text.

Use self.log() to print diagnostics.

Quote:

also, is there a way to see the mobi metadata (the summaries for each article) without having to copy them to my kindle each time? mobi readers will show the metadata for the whole book, but I don't see anything that does it for every article within the file.

I don't understand what you're asking. If you want to see the summaries while the recipe's being built, write a diagnostic subroutine to dump the metadata.

G

bcollier · 01-11-2011, 02:17 PM

Thanks! This worked great and sped up the work a lot. I'll start a new thread with my proposed changes to the NYT recipes.

Quote:

Originally Posted by GRiker

Use self.log() to print diagnostics.

I don't understand what you're asking. If you want to see the summaries while the recipe's being built, write a diagnostic subroutine to dump the metadata.

G

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
New York Times Error in .6.23	geneaber	Calibre	0	11-14-2009 12:27 PM
New York Times recipe	madrone26	Calibre	4	04-02-2009 01:13 PM
New York Times on 505	Hamza	Sony Reader	21	03-03-2008 12:55 PM
iLiad New York Times	King Mook Mook	iRex	0	12-30-2007 03:22 PM
New Reader Ad in New York Times	TadW	Sony Reader	7	07-28-2007 01:11 PM

01-07-2011, 12:35 PM	#1
bcollier Member Posts: 22 Karma: 10 Join Date: Jan 2011 Device: Kindle DX	New York Times Descriptions Not Working Hello, This is my first time posting, but I hope to get more involved, love Calibre and hopefully I can contribute back somehow. A few things, but #1 is the most important. 1) One thing that could substantially improve the Calibre New York Times news subscription is having the article descriptions in the menu. The descriptions are in the menu for the Front Page articles, but not for any other articles (see what I'm talking about in the two pictures of my Kindle). It would be very easy to fill in the description with the first two sentences from the article in the case where the description is blank rather than leave it blank. I spent several hours trying to customize the recipe to use the first two sentences but couldn't figure out how to get a hold of the text body. in the handle_article function these two lines set the description from what I can tell: description = '' pubdate = strftime('%a, %d %b') summary = div.find(True, attrs={'class':'summary'}) if summary: description = self.tag_to_string(summary, use_alt=False) How would we update these lines to parse out the first two lines from the article rather than the blank string? (2) On the topic of the NYT, what is the best time of day to schedule the New York Times for download? I've been doing 6am, but at that time there are only 1 or 2 articles in the Front Page section, at 8am this morning I accidentally downloaded again, and noticed the front page section fill up with 7 articles. Has anyone experimented with this? I am experimenting now, downloading the web version every hour to see about what time the NYT's adds articles to these versions (3) Also on the topic of the NYT for a long time they have been talking about a paywall for web content on the NYT (http://www.nytimes.com/2010/01/21/bu...a/21times.html). Has anyone heard if/when this is going into effect (this month?) and how that will effect the Calibre download? I would love to help with Calibre development, I write in Python for work so it's no problem to learn, just need to learn the ins and outs of how the system works, and what improvements are needed. Attached Thumbnails

01-07-2011, 02:29 PM	#2
kovidgoyal creator of calibre Posts: 43,857 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	1) Use the populate_article_metadata method. 3) It wont make any difference, calibre supports paywalled sites just fine, see WSJ for an example.

01-07-2011, 04:31 PM	#4
kovidgoyal creator of calibre Posts: 43,857 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	look at feeds.__init__ You have to pull content from the soup