Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 01-28-2012, 06:06 AM   #1
faber1971
Enthusiast
faber1971 began at the beginning.
 
Posts: 46
Karma: 10
Join Date: Dec 2011
Device: Kindle 3
Lightbulb recipes without rss: possible?

Let's put that I need to create a recipe from a website that does not have an rss. Let's also assume that I'm not interested at the wole website or its news, but only at one specific page. For example, look at this page:
http://www.probabiliformazioni.org/
Well, I simply need to convert this webpage, only this one, to a recipe (in which I could use the remove-tags options and so on to make this page more Kindle-friendly). IS IT POSSIBLE ON CALIBRE?
It could be very useful for some TV Guide, MEteo websites and so on, that is on websites having a one and only interesting page and not an rss (or having and rss that does not include that specific page I need).
Thanks, guys.
faber1971 is offline   Reply With Quote
Old 01-28-2012, 06:14 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,195
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Use parse_index()
kovidgoyal is offline   Reply With Quote
Advert
Old 01-28-2012, 06:29 AM   #3
faber1971
Enthusiast
faber1971 began at the beginning.
 
Posts: 46
Karma: 10
Join Date: Dec 2011
Device: Kindle 3
Is there a tutorial for dummies?
faber1971 is offline   Reply With Quote
Old 01-28-2012, 07:09 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,195
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
http://manual.calibre-ebook.com/news.html the real life example is an example of using parse_index()
kovidgoyal is offline   Reply With Quote
Old 01-30-2012, 08:01 PM   #5
oneillpt
Connoisseur
oneillpt began at the beginning.
 
Posts: 63
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
Quote:
Originally Posted by faber1971 View Post
Is there a tutorial for dummies?
Here is a recipe for some Meteo and snow reports, which may help you as Meteo sites are among those you mention. (The recipe is for use skiing in Valle d'Aosta, and collects the cross country snow reports for two sites, the webcams for those two sites, the summary snow report for Valle d'Aosta, the meteo and the meteo station map). The recipe illustrates ways of accessing various information without RSS.

Spoiler:
Code:
from calibre.constants import config_dir, CONFIG_DIR_MODE
import os, os.path, urllib
from hashlib import md5

class AdvancedUserRecipe1327958094(BasicNewsRecipe):
    title          = u'Aosta'
    language  = 'it'
    oldest_article = 7
    max_articles_per_feed = 100
    #auto_cleanup = True
    #encoding               = 'iso-8859-1'
    #INDEX='http://www.    (not needed)
    #no_stylesheets = True
    keep_only_tags    = [
                          dict(name='div', attrs={'id':'col_dx'})
                          ,dict(name='div', attrs={'id':'content'})
                          ,dict(name='img', attrs={'id':'ctl00_ContentPlaceHolder1_imgRvdaMap'})
                        ]
    remove_tags    = [
                           dict(name='table', attrs={'id':'camstable'})
                        ]

    def parse_index(self):
        self.log('==> parse_index')
        recipe_dir = os.path.join(config_dir,'recipes')
        self.log('recipe_dir: ', recipe_dir)
        hash_dir = os.path.join(recipe_dir,'recipe_storage')
        self.log('hash_dir: ', hash_dir)
        feed_dir = os.path.join(hash_dir,self.title.encode('utf-8').replace('/',':'))
        self.log('feed_dir: ', feed_dir)
        if not os.path.isdir(feed_dir):
            os.makedirs(feed_dir,mode=CONFIG_DIR_MODE)
        article_wc1 = os.path.join(feed_dir,'cognejpg.html')
        self.log('article_wc1: ', article_wc1)
        with file(article_wc1,'w') as f:
                f.write('<html><body><div id="content"><h1>Cogne</h1><img src="http://www.regione.vda.it/gestione/webcamgallery/showimage.aspx?nomefile=cogne.jpg" border=0 height="480" width="640" /></div></body></html>')
        article_wc2 = os.path.join(feed_dir,'ferretjpg.html')
        with file(article_wc2,'w') as f:
                f.write('<html><body><div id="content"><h1>Val Ferret</h1><img src="http://www.regione.vda.it/gestione/webcamgallery/showimage.aspx?nomefile=ferret.jpg" border=0 height="480" width="640" /></div></body></html>')
        article_wc3 = os.path.join(feed_dir,'cartine.html')
        with file(article_wc3,'w') as f:
                f.write('<html><body><div id="content"><h1>Cartine Meteo</h1><img src="http://gestionewww.regione.vda.it/territorio/centrofunzionale/meteo/stazioni/img/Cartine/TMP_Default.jpg" border=0 height="368" width="570" /></div></body></html>')
        articles = []
        cover = None
        if cover is not None:
            self.cover_url = cover['src']
        feeds = []

        section_title = 'Aosta'
        articles = []
        articles.append({'title':'Bollettino neve fondo: Cogne', 'url':'http://www.regione.vda.it/turismo/prima_di_partire/in_tempo_reale/fondo_dettaglio_i.asp?infopath=/turismo/prima_di_partire/in_tempo_reale/bollettino_fondo_i.asp&fk_analoc=18&pk=380'}) 
        articles.append({'title':'Bollettino neve fondo: Courmayeur/Val Ferret', 'url':'http://www.regione.vda.it/turismo/prima_di_partire/in_tempo_reale/fondo_dettaglio_i.asp?infopath=/turismo/prima_di_partire/in_tempo_reale/bollettino_fondo_i.asp&fk_analoc=43&pk=381'}) 
        articles.append({'title':'Cogne webcam', 'url':'file:///' + article_wc1})
        articles.append({'title':'Val Ferret webcam', 'url':'file:///' + article_wc2})
        #articles.append({'title':'Webcams', 'url':'http://www.regione.vda.it/turismo/prima_di_partire/in_tempo_reale/webcam_i.asp'}) 
        articles.append({'title':'Bollettino neve piste da fondo: Aosta', 'url':'http://www.regione.vda.it/turismo/prima_di_partire/in_tempo_reale/bollettino_fondo_i.asp'})
        articles.append({'title':'Bollettino meteorologico: Aosta', 'url':'http://www.regione.vda.it/turismo/prima_di_partire/in_tempo_reale/meteo_i.asp'}) 
        articles.append({'title':'Cartine Meteo', 'url':'file:///' + article_wc3})    
        if articles:
                feeds.append((section_title, articles))
        self.log('~~ feeds:')
        self.log(feeds)
        self.log('~~ feeds ~~')
        return feeds


The first three lines are needed to use local recipe storage, which I have found the simplest way to include the webcam images and the station map (any suggestion of an easier way would be welcome)

The four commented lines are left to show that I have it best to avoid auto_cleanup and stylesheet suppression, and that no encoding or INDEX variable was needed. These comments are left to remind you that your recipe may have different requirements.

The keep_only_tags are those needed for this recipe. The remove_tags does nothing here, and could be commented out. Again, it is left to remind you that you may need a real remove_tags. (There is no table with id="camstable" in the html sources used)

The recipe defines various file paths and saves three files in local recipe storage for later use, for the webcams and station map. Note that I have included <div id="content"> in these files, as without a tag found in the keep_only_tags these images and map would be lost.

The rest of the recipe builds a feed from remote sources and these local files. Uncomment the commented articles.append() line if you would like to include the thumbnail webcam images of all sites.

Some logging lines have been left in the recipe to help if the Job details are viewed. These can of course be removed. I hope this helps as a tutorial.
oneillpt is offline   Reply With Quote
Advert
Old 02-01-2012, 11:44 AM   #6
faber1971
Enthusiast
faber1971 began at the beginning.
 
Posts: 46
Karma: 10
Join Date: Dec 2011
Device: Kindle 3
Thanks. I will glance at it.
faber1971 is offline   Reply With Quote
Old 02-02-2012, 02:14 PM   #7
kiavash
Old Linux User
kiavash began at the beginning.
 
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
Also Check into the Atlantic recipe (http://bazaar.launchpad.net/~kovid/c...tlantic.recipe).

The def parse_index(self) section is written really well and pretty easy to change it for your site.
kiavash is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre - Custom news sources (RSS feeds) aren't being saved in my recipes matthewkrieger Recipes 8 08-15-2019 10:16 AM
Calibre - Custom news sources (RSS feeds) aren't being saved in my recipes matthewkrieger General Discussions 2 01-31-2011 12:06 PM
Request for recipes of sites with no rss PipSqueak Recipes 1 10-16-2010 10:05 PM
Recipes and RSS feeds and organization questions flyash Calibre 13 06-11-2010 03:56 AM
Help with RSS recipes fmma Calibre 1 06-15-2009 11:51 AM


All times are GMT -4. The time now is 10:36 PM.


MobileRead.com is a privately owned, operated and funded community.