Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-15-2010, 02:53 PM   #1
ajmoraal
Junior Member
ajmoraal began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2010
Device: BeBook One 2010
Help for populate_article_metadata

I'm working on a recipe for a certain site, that has the publication date and author on the article pages only, not on the index page.
So I though I could subclass populate_article_metadata to set this data in the article object like this:
Code:
def populate_article_metadata(self, article, soup, first):
	article.date = soup.find('div', {"class": "date"}).contents[0].strip()
	article.author = soup.find('div', {"class": "author"}).contents[0].strip()
It doesn't work however, as I now get the following error for every article it tries to download:

Code:
3% Article download failed: u'Some article'
Could not fetch link http://www.somedomain.com/somearticle
Any idea what I'm doing wrong?
ajmoraal is offline   Reply With Quote
Old 11-15-2010, 03:24 PM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by ajmoraal View Post
It doesn't work ..
Any idea what I'm doing wrong?
populate_article_metadata is used in the _postprocess_html method of news.py. At that point, I'm pretty sure the feeds have already been parsed and the index page for each feed has already been constructed. The news system is at the end of processing an article page. I don't think you can expect to set article.date for the index page at this point and have it appear on the index page for the associated article. (If that's what you are trying to do)
Starson17 is offline   Reply With Quote
Advert
Old 11-15-2010, 03:38 PM   #3
ajmoraal
Junior Member
ajmoraal began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2010
Device: BeBook One 2010
Quote:
Originally Posted by Starson17 View Post
I don't think you can expect to set article.date for the index page at this point and have it appear on the index page for the associated article. (If that's what you are trying to do)
Sort of. I was trying to filter out all articles that are older than 2 days. As there's no indication of the age on the index page, I had to take it from the article page.

Anyway, thanks for the clarification.
ajmoraal is offline   Reply With Quote
Old 11-15-2010, 03:53 PM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by ajmoraal View Post
Sort of. I was trying to filter out all articles that are older than 2 days. As there's no indication of the age on the index page, I had to take it from the article page.

Anyway, thanks for the clarification.
I'm not sure I clarified it much, but if it's of any help, I've never seen populate_article_metadata used in any recipe. The Article object has these elements:
Code:
Title       : 
URL         :
Author      :
Summary     :
Date        :
Has content :
I'm not totally sure how you would use populate_article_metadata. For example, I wouldn't think you could change the URL. since you'd already used it to download the article.

I have seen information passed from the index page to the article, but not the other way around. It should be possible, but I'd think you would have to construct the feed manually using parse_feeds by first parsing the feed, then grabbing any additional info you want from each article page, then building the Feed object you want and returning that.

edit:Perhaps Kovid will give us some insight.
Starson17 is offline   Reply With Quote
Old 11-16-2010, 11:57 AM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,328
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The use of populate_article_metadata is correct, you need to post the full error message for me to help you (i.e. the traceback, not just the message saying article failed to download)
kovidgoyal is offline   Reply With Quote
Advert
Old 11-16-2010, 03:34 PM   #6
ajmoraal
Junior Member
ajmoraal began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2010
Device: BeBook One 2010
Unfortunately it doesn't generate a stacktrace - that's why I'm a bit stuck. I just prints a warning for every article and then it goes on to the next article.

I've attached the recipe, in case you want to try it.
I'm running it on Calibre 0.7.7 as packaged in Debian Squeeze.
Attached Files
File Type: txt trouw.txt (3.2 KB, 272 views)
ajmoraal is offline   Reply With Quote
Old 11-16-2010, 03:38 PM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,328
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
stack traces are only printed if you run in verbose mode (i.e. with -vv)
kovidgoyal is offline   Reply With Quote
Old 11-16-2010, 03:56 PM   #8
ajmoraal
Junior Member
ajmoraal began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2010
Device: BeBook One 2010
Thanks, I managed to solve it now by looking at the stacktrace.
The issue was I was searching for html tags that were already stripped off (via the remove_tags_before and remove_tags_after), so contents[0] was called on the NoneType.

Btw, shouldn't "ebook-convert -?" return info about the -vv option?
ajmoraal is offline   Reply With Quote
Old 11-16-2010, 03:57 PM   #9
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kovidgoyal View Post
The use of populate_article_metadata is correct
This is interesting. I had never seen populate_article_metadata used in anything, so I played around with it a bit. You certainly can modify the index page. I easily changed article.title and article.text_summary. Those changes appeared on the index page.

I could also change article.author (although I don't see it used anywhere, so I'm not sure why you would want to change it.) When I tried to change article.date, it seemed to accept it (no errors), but it didn't appear on the index page. I used:
Code:
article.date = datetime.datetime.now()
When I intentionally used the wrong date format, I got "Could not fetch link" errors. I suspect the date format was wrong when using:
Code:
article.date = soup.find('div', {"class": "date"}).contents[0].strip()
Even if the date format was correct, I'm not sure if it would change the index page? Nothing I did would change it.
Starson17 is offline   Reply With Quote
Old 11-16-2010, 04:12 PM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,328
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@ajmoraal: ebook-convert test.recipe .epub -h

is what you are looking for.

@starson17: article author info is used when creating special periodical downloads for the Kindle and/or SONY. As for date not changing, could be any number of things, I haven't got the time right now to look into it

Last edited by kovidgoyal; 11-16-2010 at 04:31 PM.
kovidgoyal is offline   Reply With Quote
Old 11-16-2010, 04:24 PM   #11
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kovidgoyal View Post
@starson17: article author info is used when creating special periodical downloads for the Kindle and/or SONY.
Thanks for that bit of info.
Quote:
As for date not changing, could be any number of things, I haven't got the time right now to look into it
I have no immediate use for this anyway. For all I know, I did something wrong during the testing.
Starson17 is offline   Reply With Quote
Reply


Forum Jump


All times are GMT -4. The time now is 09:10 PM.


MobileRead.com is a privately owned, operated and funded community.