![]() |
#1 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Nov 2010
Device: BeBook One 2010
|
Help for populate_article_metadata
I'm working on a recipe for a certain site, that has the publication date and author on the article pages only, not on the index page.
So I though I could subclass populate_article_metadata to set this data in the article object like this: Code:
def populate_article_metadata(self, article, soup, first): article.date = soup.find('div', {"class": "date"}).contents[0].strip() article.author = soup.find('div', {"class": "author"}).contents[0].strip() Code:
3% Article download failed: u'Some article' Could not fetch link http://www.somedomain.com/somearticle |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
populate_article_metadata is used in the _postprocess_html method of news.py. At that point, I'm pretty sure the feeds have already been parsed and the index page for each feed has already been constructed. The news system is at the end of processing an article page. I don't think you can expect to set article.date for the index page at this point and have it appear on the index page for the associated article. (If that's what you are trying to do)
![]() |
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Nov 2010
Device: BeBook One 2010
|
Quote:
Anyway, thanks for the clarification. |
|
![]() |
![]() |
![]() |
#4 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Code:
Title : URL : Author : Summary : Date : Has content : I have seen information passed from the index page to the article, but not the other way around. It should be possible, but I'd think you would have to construct the feed manually using parse_feeds by first parsing the feed, then grabbing any additional info you want from each article page, then building the Feed object you want and returning that. edit:Perhaps Kovid will give us some insight. |
|
![]() |
![]() |
![]() |
#5 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,328
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The use of populate_article_metadata is correct, you need to post the full error message for me to help you (i.e. the traceback, not just the message saying article failed to download)
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Nov 2010
Device: BeBook One 2010
|
Unfortunately it doesn't generate a stacktrace - that's why I'm a bit stuck. I just prints a warning for every article and then it goes on to the next article.
I've attached the recipe, in case you want to try it. I'm running it on Calibre 0.7.7 as packaged in Debian Squeeze. |
![]() |
![]() |
![]() |
#7 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,328
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
stack traces are only printed if you run in verbose mode (i.e. with -vv)
|
![]() |
![]() |
![]() |
#8 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Nov 2010
Device: BeBook One 2010
|
Thanks, I managed to solve it now by looking at the stacktrace.
The issue was I was searching for html tags that were already stripped off (via the remove_tags_before and remove_tags_after), so contents[0] was called on the NoneType. Btw, shouldn't "ebook-convert -?" return info about the -vv option? |
![]() |
![]() |
![]() |
#9 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
This is interesting. I had never seen populate_article_metadata used in anything, so I played around with it a bit. You certainly can modify the index page. I easily changed article.title and article.text_summary. Those changes appeared on the index page.
I could also change article.author (although I don't see it used anywhere, so I'm not sure why you would want to change it.) When I tried to change article.date, it seemed to accept it (no errors), but it didn't appear on the index page. I used: Code:
article.date = datetime.datetime.now() Code:
article.date = soup.find('div', {"class": "date"}).contents[0].strip() |
![]() |
![]() |
![]() |
#10 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,328
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@ajmoraal: ebook-convert test.recipe .epub -h
is what you are looking for. @starson17: article author info is used when creating special periodical downloads for the Kindle and/or SONY. As for date not changing, could be any number of things, I haven't got the time right now to look into it Last edited by kovidgoyal; 11-16-2010 at 04:31 PM. |
![]() |
![]() |
![]() |
#11 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
|
||
![]() |
![]() |