Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 03-28-2010, 09:02 PM   #1
EnergyLens
Hack
EnergyLens began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Dec 2009
Device: Sony Reader 600
Article Dates with parse_index

I've been hunting for a recipe example that takes a date parsed from html and converts it into the proper format so that the article date displays correctly.

It seems that all the examples append
'date':''

I can't find anything in the documentation that specifies what format to use, and it doesn't work when I append, for exmple:

articles.append({'title':title, 'url':url, 'description':desc, 'date':'Thursday, July 12, 2007'})
EnergyLens is offline   Reply With Quote
Old 03-29-2010, 05:10 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,449
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The date you set above is, IIRC, used only in the index of articles in any given section. What date are ou trying to set? The date used in the title of the downloaded ebook?
kovidgoyal is online now   Reply With Quote
 
Enthusiast
Old 03-29-2010, 07:32 AM   #3
EnergyLens
Hack
EnergyLens began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Dec 2009
Device: Sony Reader 600
Article Date

I am trying to set the date for the article that is shown after the title in the article index... but it always shows the time of creation rather than a date that I attempt to set. I assumed that I had an incorrect date format and that was why it was not being set.

Ultimately I am hoping that dates I set for the articles can be used by the recipe (oldest_article) to determine whether or not to include an article from the "feed" I've created with parse_index
EnergyLens is offline   Reply With Quote
Old 03-29-2010, 12:55 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,449
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
old_articles is only used for RSS processing. If you are writing a parse_index yourself, just compare the dates and skip tho old articles yourself.
kovidgoyal is online now   Reply With Quote
Old 04-13-2010, 11:52 AM   #5
EnergyLens
Hack
EnergyLens began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Dec 2009
Device: Sony Reader 600
parse_index and setting dates for index of articles

Back to the original topic, it doesn't appear that:

articles.append({'title':title, 'url':url, 'description':desc, 'date':'Thursday, July 12, 2007'})

actually sets the date in the index of articles. The index of articles, at least using parse_index, *always* uses the date/time at the moment of creation.

------------------------------
FYI: I'm taking a directory of saved web pages and using ebook-convert to convert them all into an epub:
------------------------------

#!/usr/bin/env python

__license__ = 'GPL v3'
'''
Directory to Epub
'''
import string
import time

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag, NavigableString

class ImportDirectory(BasicNewsRecipe):

title = 'Energy Bulletin'
description = 'EnergyBulletin.net is a clearinghouse for information regarding the peak in global energy supply.'
INDEX = 'http://localhost/~myaccount/Scrapbook/'
language = 'en'
keep_only_tags = [dict(id='main_content')]
remove_tags = [dict(name='div', attrs={'class':'links'})]

no_stylesheets = True

def parse_index(self):
articles = []

soup = self.index_to_soup(self.INDEX)

feeds = []
for node in soup.findAll('tr'):
x = node.find('img',attrs={'src':'/icons/folder.gif'})
a = node.find('a', href=True)
if a is not None and x is not None:
url = a['href']
url = 'http://localhost/~charlesallen/Scrapbook/'+url
desc = None
newsoup = self.index_to_soup(url)
if newsoup is not None:
atitle = newsoup.find('title')
title = self.tag_to_string(atitle)
adate = newsoup.find('span',attrs={'class':'date-display-single'})
pubdt = self.tag_to_string(adate)
mytime = time.strptime(pubdt,"%b %d %Y")
dt = time.strftime('%A, %d %B, %Y',mytime)
origin = newsoup.find('div',attrs={'class':'origin'})
author = self.tag_to_string(origin)
self.log('\tFound article ',title,' at ', url, 'origin: ',author)
articles.append({'title':title, 'url':url, 'description':'','date':dt})


feeds.append(('Articles', articles))

return feeds

Last edited by EnergyLens; 04-14-2010 at 07:34 AM.
EnergyLens is offline   Reply With Quote
Old 04-14-2010, 01:56 AM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,449
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Maybe that's the case, I'll have to look at the code to be sure. Open a ticket and I'll get to it when I have some time.
kovidgoyal is online now   Reply With Quote
Old 04-21-2010, 10:13 PM   #7
EnergyLens
Hack
EnergyLens began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Dec 2009
Device: Sony Reader 600
I'm beginning to suspect that ebook-convert also ignores --level1-toc= & etc. directives when parse_index is used. I've gotten --levelX-toc to work fine when converting .txt documents and individual .html documents to .epub, but cannot make it work with recipes that use parse_index.

Perhaps I'm not understanding something, but I expected it to build a TOC from the Xpath matches in each article returned in feeds.

p.s. am I right that --foo= is the only command line argument that ebook-convert will accept apart from those documented? I was trying to pass command line date to my recipes and just happened to use --foo= the first time and it worked. all other attempts to pass command line variables cause ebook-convert to stop with an exception that there is no such option. Ah, the power of FOO! (please don't remove --foo= as that is my only way to pass my own command line arguments !-)
EnergyLens is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Decorate article headings as hyperlinks to full article? tomsem Recipes 5 10-15-2010 08:30 PM
Omitting description an author in parse_index nickredding Calibre 0 12-31-2009 04:19 PM
Classic WSJ Article on Ship Dates for Nook Critteranne Barnes & Noble NOOK 10 11-16-2009 10:29 PM
Kindle 2 Shipping Dates Cutestory Amazon Kindle 29 02-13-2009 11:30 AM
Dates in Russian (?) Roger Wilmut Calibre 10 11-24-2008 06:22 PM


All times are GMT -4. The time now is 12:21 AM.


MobileRead.com is a privately owned, operated and funded community.