03-12-2015, 03:09 PM | #1 |
Junior Member
Posts: 1
Karma: 10
Join Date: Mar 2015
Device: Kindle
|
New Recipe - www.diariodeburgos.es
Hi Guys!
I´d like to ask for help as I expended hours trying different approaches to get the needed content and I can´t. At the end .. the best approach was using the auto_cleanup option that detects perfectly what I want except that is removing the photo of the news. The RSS I´d like to parse is: http://www.diariodeburgos.es/rss/DBPortada.xml I´m using the following code: Code:
import time from calibre.ptempfile import PersistentTemporaryFile from calibre.web.feeds.news import BasicNewsRecipe class DiarioDeBurgos(BasicNewsRecipe): title = u'Diario de Burgos' oldest_article = 1 max_articles_per_feed = 10 ignore_duplicate_articles = {'url'} use_embedded_content = False no_stylesheets = True auto_cleanup = True feeds = [ (u'Portada', u'http://www.diariodeburgos.es/rss/DBPortada.xml'), ] def get_cover_url(self): return 'http://i.promecal.es/Portadas/DB-G.jpg' <div id="divImgNoticia0" class="GaleriaNoticiaFoto" ... I tried the following code but no luck: auto_cleanup_keep = '//div[@id="divImgNoticia0"]' I´d really appreciate if someone could help me to identify what I´m doing wrong. It seems that the command auto_cleanup_keep is easy to use ... but not working somehow. The idea is to keep only the tags <div class="Titular"> <span id="ctl00_cph2Columnas_lblTextoNoticia"> <div id="divImgNoticia0" class="GaleriaNoticiaFoto" style="cursorointer;cursor:hand"> I tried also to use the command 'keep_only_tags' but not luck neither .. in this case the element 'ctl00_cph2Columnas_lblTextoNoticia' is not being added. Many thanks in advanced for your help and time. Regards, Nano. Last edited by PeterT; 03-12-2015 at 05:09 PM. Reason: Editted to include [code] . [/code] to make the script easier to read |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
recipe for newspaper subscription www.nd.nl | NicoDeMus! | Recipes | 0 | 11-24-2011 05:02 PM |
http://www.cfo.com/magazine/ recipe request | jonathan22 | Recipes | 0 | 09-10-2011 02:50 AM |
How to create recipe for http://www.pm-magazin.de/ | xXxXxXxXxXx | Recipes | 3 | 05-17-2011 09:57 AM |
Recipe for www.diariodeibiza.es ready | quini | Recipes | 0 | 04-29-2011 02:09 PM |
recipe request: www.aldaily.com | jshzh | Recipes | 0 | 02-07-2011 01:00 AM |