View Single Post
Old 04-13-2010, 11:52 AM   #5
EnergyLens
Hack
EnergyLens began at the beginning.
 
Posts: 34
Karma: 12
Join Date: Dec 2009
Device: Kobo Aura HD, Kindle Paperwhite
parse_index and setting dates for index of articles

Back to the original topic, it doesn't appear that:

articles.append({'title':title, 'url':url, 'description':desc, 'date':'Thursday, July 12, 2007'})

actually sets the date in the index of articles. The index of articles, at least using parse_index, *always* uses the date/time at the moment of creation.

------------------------------
FYI: I'm taking a directory of saved web pages and using ebook-convert to convert them all into an epub:
------------------------------

#!/usr/bin/env python

__license__ = 'GPL v3'
'''
Directory to Epub
'''
import string
import time

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag, NavigableString

class ImportDirectory(BasicNewsRecipe):

title = 'Energy Bulletin'
description = 'EnergyBulletin.net is a clearinghouse for information regarding the peak in global energy supply.'
INDEX = 'http://localhost/~myaccount/Scrapbook/'
language = 'en'
keep_only_tags = [dict(id='main_content')]
remove_tags = [dict(name='div', attrs={'class':'links'})]

no_stylesheets = True

def parse_index(self):
articles = []

soup = self.index_to_soup(self.INDEX)

feeds = []
for node in soup.findAll('tr'):
x = node.find('img',attrs={'src':'/icons/folder.gif'})
a = node.find('a', href=True)
if a is not None and x is not None:
url = a['href']
url = 'http://localhost/~charlesallen/Scrapbook/'+url
desc = None
newsoup = self.index_to_soup(url)
if newsoup is not None:
atitle = newsoup.find('title')
title = self.tag_to_string(atitle)
adate = newsoup.find('span',attrs={'class':'date-display-single'})
pubdt = self.tag_to_string(adate)
mytime = time.strptime(pubdt,"%b %d %Y")
dt = time.strftime('%A, %d %B, %Y',mytime)
origin = newsoup.find('div',attrs={'class':'origin'})
author = self.tag_to_string(origin)
self.log('\tFound article ',title,' at ', url, 'origin: ',author)
articles.append({'title':title, 'url':url, 'description':'','date':dt})


feeds.append(('Articles', articles))

return feeds

Last edited by EnergyLens; 04-14-2010 at 07:34 AM.
EnergyLens is offline   Reply With Quote