Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-12-2015, 03:09 PM   #1
nanodreams
Junior Member
nanodreams began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Mar 2015
Device: Kindle
Post New Recipe - www.diariodeburgos.es

Hi Guys!

I´d like to ask for help as I expended hours trying different approaches to get the needed content and I can´t.

At the end .. the best approach was using the auto_cleanup option that detects perfectly what I want except that is removing the photo of the news.

The RSS I´d like to parse is:

http://www.diariodeburgos.es/rss/DBPortada.xml

I´m using the following code:
Code:
        import time
        from calibre.ptempfile import PersistentTemporaryFile
        from calibre.web.feeds.news import BasicNewsRecipe


        class DiarioDeBurgos(BasicNewsRecipe):
            title          = u'Diario de Burgos'
            oldest_article = 1
            max_articles_per_feed = 10
            ignore_duplicate_articles = {'url'}
            use_embedded_content = False
            no_stylesheets = True
            auto_cleanup = True

            feeds          = [
                                (u'Portada', u'http://www.diariodeburgos.es/rss/DBPortada.xml'),
                             ]
            def get_cover_url(self):
               return  'http://i.promecal.es/Portadas/DB-G.jpg'
I tried to use the command 'auto_cleanup_keep', but it seems that it´s not working for me. I´d like to keep the div called `divImgNoticia0` and the tag looks like

<div id="divImgNoticia0" class="GaleriaNoticiaFoto" ...

I tried the following code but no luck:

auto_cleanup_keep = '//div[@id="divImgNoticia0"]'

I´d really appreciate if someone could help me to identify what I´m doing wrong. It seems that the command auto_cleanup_keep is easy to use ... but not working somehow.

The idea is to keep only the tags

<div class="Titular">
<span id="ctl00_cph2Columnas_lblTextoNoticia">
<div id="divImgNoticia0" class="GaleriaNoticiaFoto" style="cursorointer;cursor:hand">

I tried also to use the command 'keep_only_tags' but not luck neither .. in this case the element 'ctl00_cph2Columnas_lblTextoNoticia' is not being added.

Many thanks in advanced for your help and time.

Regards,
Nano.

Last edited by PeterT; 03-12-2015 at 05:09 PM. Reason: Editted to include [code] . [/code] to make the script easier to read
nanodreams is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
recipe for newspaper subscription www.nd.nl NicoDeMus! Recipes 0 11-24-2011 05:02 PM
http://www.cfo.com/magazine/ recipe request jonathan22 Recipes 0 09-10-2011 02:50 AM
How to create recipe for http://www.pm-magazin.de/ xXxXxXxXxXx Recipes 3 05-17-2011 09:57 AM
Recipe for www.diariodeibiza.es ready quini Recipes 0 04-29-2011 02:09 PM
recipe request: www.aldaily.com jshzh Recipes 0 02-07-2011 01:00 AM


All times are GMT -4. The time now is 11:58 PM.


MobileRead.com is a privately owned, operated and funded community.