Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-12-2015, 03:09 PM   #1
nanodreams
Junior Member
nanodreams began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Mar 2015
Device: Kindle
Post New Recipe - www.diariodeburgos.es

Hi Guys!

I´d like to ask for help as I expended hours trying different approaches to get the needed content and I can´t.

At the end .. the best approach was using the auto_cleanup option that detects perfectly what I want except that is removing the photo of the news.

The RSS I´d like to parse is:

http://www.diariodeburgos.es/rss/DBPortada.xml

I´m using the following code:
Code:
        import time
        from calibre.ptempfile import PersistentTemporaryFile
        from calibre.web.feeds.news import BasicNewsRecipe


        class DiarioDeBurgos(BasicNewsRecipe):
            title          = u'Diario de Burgos'
            oldest_article = 1
            max_articles_per_feed = 10
            ignore_duplicate_articles = {'url'}
            use_embedded_content = False
            no_stylesheets = True
            auto_cleanup = True

            feeds          = [
                                (u'Portada', u'http://www.diariodeburgos.es/rss/DBPortada.xml'),
                             ]
            def get_cover_url(self):
               return  'http://i.promecal.es/Portadas/DB-G.jpg'
I tried to use the command 'auto_cleanup_keep', but it seems that it´s not working for me. I´d like to keep the div called `divImgNoticia0` and the tag looks like

<div id="divImgNoticia0" class="GaleriaNoticiaFoto" ...

I tried the following code but no luck:

auto_cleanup_keep = '//div[@id="divImgNoticia0"]'

I´d really appreciate if someone could help me to identify what I´m doing wrong. It seems that the command auto_cleanup_keep is easy to use ... but not working somehow.

The idea is to keep only the tags

<div class="Titular">
<span id="ctl00_cph2Columnas_lblTextoNoticia">
<div id="divImgNoticia0" class="GaleriaNoticiaFoto" style="cursorointer;cursor:hand">

I tried also to use the command 'keep_only_tags' but not luck neither .. in this case the element 'ctl00_cph2Columnas_lblTextoNoticia' is not being added.

Many thanks in advanced for your help and time.

Regards,
Nano.

Last edited by PeterT; 03-12-2015 at 05:09 PM. Reason: Editted to include [code] . [/code] to make the script easier to read
nanodreams is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
recipe for newspaper subscription www.nd.nl NicoDeMus! Recipes 0 11-24-2011 05:02 PM
http://www.cfo.com/magazine/ recipe request jonathan22 Recipes 0 09-10-2011 02:50 AM
How to create recipe for http://www.pm-magazin.de/ xXxXxXxXxXx Recipes 3 05-17-2011 09:57 AM
Recipe for www.diariodeibiza.es ready quini Recipes 0 04-29-2011 02:09 PM
recipe request: www.aldaily.com jshzh Recipes 0 02-07-2011 01:00 AM


All times are GMT -4. The time now is 12:45 PM.


MobileRead.com is a privately owned, operated and funded community.