View Single Post
Old 01-30-2012, 08:01 PM   #5
oneillpt
Connoisseur
oneillpt began at the beginning.
 
Posts: 63
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
Quote:
Originally Posted by faber1971 View Post
Is there a tutorial for dummies?
Here is a recipe for some Meteo and snow reports, which may help you as Meteo sites are among those you mention. (The recipe is for use skiing in Valle d'Aosta, and collects the cross country snow reports for two sites, the webcams for those two sites, the summary snow report for Valle d'Aosta, the meteo and the meteo station map). The recipe illustrates ways of accessing various information without RSS.

Spoiler:
Code:
from calibre.constants import config_dir, CONFIG_DIR_MODE
import os, os.path, urllib
from hashlib import md5

class AdvancedUserRecipe1327958094(BasicNewsRecipe):
    title          = u'Aosta'
    language  = 'it'
    oldest_article = 7
    max_articles_per_feed = 100
    #auto_cleanup = True
    #encoding               = 'iso-8859-1'
    #INDEX='http://www.    (not needed)
    #no_stylesheets = True
    keep_only_tags    = [
                          dict(name='div', attrs={'id':'col_dx'})
                          ,dict(name='div', attrs={'id':'content'})
                          ,dict(name='img', attrs={'id':'ctl00_ContentPlaceHolder1_imgRvdaMap'})
                        ]
    remove_tags    = [
                           dict(name='table', attrs={'id':'camstable'})
                        ]

    def parse_index(self):
        self.log('==> parse_index')
        recipe_dir = os.path.join(config_dir,'recipes')
        self.log('recipe_dir: ', recipe_dir)
        hash_dir = os.path.join(recipe_dir,'recipe_storage')
        self.log('hash_dir: ', hash_dir)
        feed_dir = os.path.join(hash_dir,self.title.encode('utf-8').replace('/',':'))
        self.log('feed_dir: ', feed_dir)
        if not os.path.isdir(feed_dir):
            os.makedirs(feed_dir,mode=CONFIG_DIR_MODE)
        article_wc1 = os.path.join(feed_dir,'cognejpg.html')
        self.log('article_wc1: ', article_wc1)
        with file(article_wc1,'w') as f:
                f.write('<html><body><div id="content"><h1>Cogne</h1><img src="http://www.regione.vda.it/gestione/webcamgallery/showimage.aspx?nomefile=cogne.jpg" border=0 height="480" width="640" /></div></body></html>')
        article_wc2 = os.path.join(feed_dir,'ferretjpg.html')
        with file(article_wc2,'w') as f:
                f.write('<html><body><div id="content"><h1>Val Ferret</h1><img src="http://www.regione.vda.it/gestione/webcamgallery/showimage.aspx?nomefile=ferret.jpg" border=0 height="480" width="640" /></div></body></html>')
        article_wc3 = os.path.join(feed_dir,'cartine.html')
        with file(article_wc3,'w') as f:
                f.write('<html><body><div id="content"><h1>Cartine Meteo</h1><img src="http://gestionewww.regione.vda.it/territorio/centrofunzionale/meteo/stazioni/img/Cartine/TMP_Default.jpg" border=0 height="368" width="570" /></div></body></html>')
        articles = []
        cover = None
        if cover is not None:
            self.cover_url = cover['src']
        feeds = []

        section_title = 'Aosta'
        articles = []
        articles.append({'title':'Bollettino neve fondo: Cogne', 'url':'http://www.regione.vda.it/turismo/prima_di_partire/in_tempo_reale/fondo_dettaglio_i.asp?infopath=/turismo/prima_di_partire/in_tempo_reale/bollettino_fondo_i.asp&fk_analoc=18&pk=380'}) 
        articles.append({'title':'Bollettino neve fondo: Courmayeur/Val Ferret', 'url':'http://www.regione.vda.it/turismo/prima_di_partire/in_tempo_reale/fondo_dettaglio_i.asp?infopath=/turismo/prima_di_partire/in_tempo_reale/bollettino_fondo_i.asp&fk_analoc=43&pk=381'}) 
        articles.append({'title':'Cogne webcam', 'url':'file:///' + article_wc1})
        articles.append({'title':'Val Ferret webcam', 'url':'file:///' + article_wc2})
        #articles.append({'title':'Webcams', 'url':'http://www.regione.vda.it/turismo/prima_di_partire/in_tempo_reale/webcam_i.asp'}) 
        articles.append({'title':'Bollettino neve piste da fondo: Aosta', 'url':'http://www.regione.vda.it/turismo/prima_di_partire/in_tempo_reale/bollettino_fondo_i.asp'})
        articles.append({'title':'Bollettino meteorologico: Aosta', 'url':'http://www.regione.vda.it/turismo/prima_di_partire/in_tempo_reale/meteo_i.asp'}) 
        articles.append({'title':'Cartine Meteo', 'url':'file:///' + article_wc3})    
        if articles:
                feeds.append((section_title, articles))
        self.log('~~ feeds:')
        self.log(feeds)
        self.log('~~ feeds ~~')
        return feeds


The first three lines are needed to use local recipe storage, which I have found the simplest way to include the webcam images and the station map (any suggestion of an easier way would be welcome)

The four commented lines are left to show that I have it best to avoid auto_cleanup and stylesheet suppression, and that no encoding or INDEX variable was needed. These comments are left to remind you that your recipe may have different requirements.

The keep_only_tags are those needed for this recipe. The remove_tags does nothing here, and could be commented out. Again, it is left to remind you that you may need a real remove_tags. (There is no table with id="camstable" in the html sources used)

The recipe defines various file paths and saves three files in local recipe storage for later use, for the webcams and station map. Note that I have included <div id="content"> in these files, as without a tag found in the keep_only_tags these images and map would be lost.

The rest of the recipe builds a feed from remote sources and these local files. Uncomment the commented articles.append() line if you would like to include the thumbnail webcam images of all sites.

Some logging lines have been left in the recipe to help if the Job details are viewed. These can of course be removed. I hope this helps as a tutorial.
oneillpt is offline   Reply With Quote