03-28-2010, 09:02 PM | #1 |
Hack
Posts: 34
Karma: 12
Join Date: Dec 2009
Device: Kobo Aura HD, Kindle Paperwhite
|
Article Dates with parse_index
I've been hunting for a recipe example that takes a date parsed from html and converts it into the proper format so that the article date displays correctly.
It seems that all the examples append 'date':'' I can't find anything in the documentation that specifies what format to use, and it doesn't work when I append, for exmple: articles.append({'title':title, 'url':url, 'description':desc, 'date':'Thursday, July 12, 2007'}) |
03-29-2010, 05:10 AM | #2 |
creator of calibre
Posts: 43,924
Karma: 22669820
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The date you set above is, IIRC, used only in the index of articles in any given section. What date are ou trying to set? The date used in the title of the downloaded ebook?
|
Advert | |
|
03-29-2010, 07:32 AM | #3 |
Hack
Posts: 34
Karma: 12
Join Date: Dec 2009
Device: Kobo Aura HD, Kindle Paperwhite
|
Article Date
I am trying to set the date for the article that is shown after the title in the article index... but it always shows the time of creation rather than a date that I attempt to set. I assumed that I had an incorrect date format and that was why it was not being set.
Ultimately I am hoping that dates I set for the articles can be used by the recipe (oldest_article) to determine whether or not to include an article from the "feed" I've created with parse_index |
03-29-2010, 12:55 PM | #4 |
creator of calibre
Posts: 43,924
Karma: 22669820
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
old_articles is only used for RSS processing. If you are writing a parse_index yourself, just compare the dates and skip tho old articles yourself.
|
04-13-2010, 11:52 AM | #5 |
Hack
Posts: 34
Karma: 12
Join Date: Dec 2009
Device: Kobo Aura HD, Kindle Paperwhite
|
parse_index and setting dates for index of articles
Back to the original topic, it doesn't appear that:
articles.append({'title':title, 'url':url, 'description':desc, 'date':'Thursday, July 12, 2007'}) actually sets the date in the index of articles. The index of articles, at least using parse_index, *always* uses the date/time at the moment of creation. ------------------------------ FYI: I'm taking a directory of saved web pages and using ebook-convert to convert them all into an epub: ------------------------------ #!/usr/bin/env python __license__ = 'GPL v3' ''' Directory to Epub ''' import string import time from calibre.web.feeds.news import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import Tag, NavigableString class ImportDirectory(BasicNewsRecipe): title = 'Energy Bulletin' description = 'EnergyBulletin.net is a clearinghouse for information regarding the peak in global energy supply.' INDEX = 'http://localhost/~myaccount/Scrapbook/' language = 'en' keep_only_tags = [dict(id='main_content')] remove_tags = [dict(name='div', attrs={'class':'links'})] no_stylesheets = True def parse_index(self): articles = [] soup = self.index_to_soup(self.INDEX) feeds = [] for node in soup.findAll('tr'): x = node.find('img',attrs={'src':'/icons/folder.gif'}) a = node.find('a', href=True) if a is not None and x is not None: url = a['href'] url = 'http://localhost/~charlesallen/Scrapbook/'+url desc = None newsoup = self.index_to_soup(url) if newsoup is not None: atitle = newsoup.find('title') title = self.tag_to_string(atitle) adate = newsoup.find('span',attrs={'class':'date-display-single'}) pubdt = self.tag_to_string(adate) mytime = time.strptime(pubdt,"%b %d %Y") dt = time.strftime('%A, %d %B, %Y',mytime) origin = newsoup.find('div',attrs={'class':'origin'}) author = self.tag_to_string(origin) self.log('\tFound article ',title,' at ', url, 'origin: ',author) articles.append({'title':title, 'url':url, 'description':'','date':dt}) feeds.append(('Articles', articles)) return feeds Last edited by EnergyLens; 04-14-2010 at 07:34 AM. |
Advert | |
|
04-14-2010, 01:56 AM | #6 |
creator of calibre
Posts: 43,924
Karma: 22669820
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Maybe that's the case, I'll have to look at the code to be sure. Open a ticket and I'll get to it when I have some time.
|
04-21-2010, 10:13 PM | #7 |
Hack
Posts: 34
Karma: 12
Join Date: Dec 2009
Device: Kobo Aura HD, Kindle Paperwhite
|
I'm beginning to suspect that ebook-convert also ignores --level1-toc= & etc. directives when parse_index is used. I've gotten --levelX-toc to work fine when converting .txt documents and individual .html documents to .epub, but cannot make it work with recipes that use parse_index.
Perhaps I'm not understanding something, but I expected it to build a TOC from the Xpath matches in each article returned in feeds. p.s. am I right that --foo= is the only command line argument that ebook-convert will accept apart from those documented? I was trying to pass command line date to my recipes and just happened to use --foo= the first time and it worked. all other attempts to pass command line variables cause ebook-convert to stop with an exception that there is no such option. Ah, the power of FOO! (please don't remove --foo= as that is my only way to pass my own command line arguments !-) |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Decorate article headings as hyperlinks to full article? | tomsem | Recipes | 5 | 10-15-2010 08:30 PM |
Omitting description an author in parse_index | nickredding | Calibre | 0 | 12-31-2009 04:19 PM |
Classic WSJ Article on Ship Dates for Nook | Critteranne | Barnes & Noble NOOK | 10 | 11-16-2009 10:29 PM |
Kindle 2 Shipping Dates | Cutestory | Amazon Kindle | 29 | 02-13-2009 11:30 AM |
Dates in Russian (?) | Roger Wilmut | Calibre | 10 | 11-24-2008 06:22 PM |