Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 02-27-2011, 03:49 PM   #1
partymonkey
Junior Member
partymonkey began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Feb 2011
Device: Kindle
Recipe for La Nacion (argentina) has issues

In the last few days, I've noticed issues with the La Nacion feed:
1) Some articles are downloading as gobbly gook, whereas others don't. Also, it doesn't seem to be consistent as to which ones get scrambled.
2) Articles contain additional content, such as related articles, comments, recommended articles, etc.
3) The logo/header directory location changed.

I tried to figure out both of these issues, so I've made some changes to the recipe, but I'm still getting cases of #1. I've been able to fix several of #2, and fixed #3.

Can anyone help me with #1? This is new to me so I'm not sure where to go next.

Here's my edited recipe, based on the built-in one:
__license__ = 'GPL v3'
__copyright__ = '2008-2010, Darko Miletic <darko.miletic at gmail.com>'
'''
lanacion.com.ar
'''

from calibre.web.feeds.news import BasicNewsRecipe

class Lanacion(BasicNewsRecipe):
title = u'La Nacion'
__author__ = u'Darko Miletic'
description = "lanacion.com - Informacion actualizada las 24 horas, con noticias de Argentina y del mundo"
publisher = u'La Nacion S.A.'
category = 'news, politics, Argentina'
oldest_article = 1
max_articles_per_feed = 100
use_embedded_content = False
no_stylesheets = True
language = 'es_AR'
publication_type = 'newspaper'
remove_empty_feeds = True
cover_url = 'http://www.lanacion.com.ar/_ui/desktop/imgs/layout/logos/ln341x47.gif'
masthead_url = 'http://www.lanacion.com.ar/_ui/desktop/imgs/layout/logos/ln341x47.gif'
extra_css = """ h1{font-family: Georgia,serif}
h2{color: #626262}
body{font-family: Arial,sans-serif}
img{margin-top: 0.5em; margin-bottom: 0.2em; display: block}
.notaFecha{color: #808080}
.notaEpigrafe{font-size: x-small}
.topNota h1{font-family: Arial,sans-serif}
"""


conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher': publisher
, 'language' : language
}

keep_only_tags = [dict(name='div', attrs={'class':['nota floatFix','nota floatfix','topNota','nota','post']})]
remove_tags = [
dict(name='div' , attrs={'class':'notaComentario floatFix noprint' })
,dict(name='ul' , attrs={'class':['cajaHerramientas cajaTop noprint','herramientas noprint']})
,dict(name='li' , attrs={'class':'floatFix'})
,dict(name='div' , attrs={'class':['cajaHerramientas noprint','cajaHerramientas floatFix'] })
,dict(attrs={'class':['titulosMultimedia','derecha','techo color','encuesta','izquierda compartir','floatFix','videoCentro','leyo','relaci onadas', 'relac noprint']})
,dict(name=['iframe','embed','object','form','base','hr','meta ','link','input'])
]
remove_tags_after = dict(attrs={'class':['tags','nota-destacado','leyo','relacionadas','floatFix ultimasNoticias']})
remove_attributes = ['height','width','visible','onclick','data-count','name']

feeds = [
(u'Ultimas noticias' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?origen=2' )
,(u'Politica' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=30' )
,(u'Economia' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=272' )
,(u'Deportes' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=131' )
,(u'Informacion General' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=21' )
,(u'Cultura' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=1' )
,(u'Opinion' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=28' )
,(u'Espectaculos' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=120' )
,(u'Exterior' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=7' )
,(u'Ciencia&Salud' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=498' )
,(u'Revista' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=494' )
,(u'Enfoques' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=421' )
,(u'Comercio Exterior' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=347' )
,(u'Tecnologia' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=432' )
,(u'Arquitectura' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=366' )
,(u'Turismo' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=504' )
,(u'Al volante' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=371' )
,(u'El Campo' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=337' )
,(u'Moda y Belleza' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=1312' )
,(u'Inmuebles Comerciales', u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=1363' )
,(u'Countries' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=1348' )
,(u'adnCultura' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=6734' )
,(u'The Wall Street Journal Americas', u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=6373' )
,(u'Management' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=7380' )
,(u'Bicentenario' , u'http://www.lanacion.com.ar/herramientas/rss/index.asp?categoria_id=7276' )
]

def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
return self.adeify_images(soup)
partymonkey is offline   Reply With Quote
Old 03-02-2011, 06:47 AM   #2
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
In cases like yours you should report a bug in calibre bug tracker. No need to post the recipe since it is already included with the software. I'll look into that today and update the recipe.
kiklop74 is offline   Reply With Quote
Old 03-03-2011, 12:15 PM   #3
partymonkey
Junior Member
partymonkey began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Feb 2011
Device: Kindle
Thanks for your help and the pointers.

I only posted the recipe because I made changes to the tags to remove some of the unwanted content after the article's text, and thought those might be useful to others helping me with this.

If you could post back when you get a chance to look at the recipe and see if you can figure out how to avoid some of the articles from being scrambled/encoded, that would be awesome.

Thanks again.
partymonkey is offline   Reply With Quote
Old 03-03-2011, 02:47 PM   #4
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
There is an issue with this site and calibre. I'm not sure if it is a bug in calibre or within software. I'll look into this some more.
kiklop74 is offline   Reply With Quote
Old 03-05-2011, 07:47 PM   #5
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
I found an issue, it seems that some articles in the feeds of this site are just redirects, and for some reason when redirect occurs calibre obtains garbage content...

Quick solution for this is to skip such links ( I detected one pattern related to blog posts ).

Kovid do you have an idea how to handle such case?

Would you mind taking a look (I will supply needed instructions and modified recipe)?
kiklop74 is offline   Reply With Quote
Old 03-05-2011, 07:52 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Sure, I'll take a look.
kovidgoyal is offline   Reply With Quote
Old 03-05-2011, 07:59 PM   #7
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
OK,

Here is the issue

http://bugs.calibre-ebook.com/ticket/9295
kiklop74 is offline   Reply With Quote
Old 03-07-2011, 10:02 AM   #8
partymonkey
Junior Member
partymonkey began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Feb 2011
Device: Kindle
Thank you both for your help.
This one is beyond my calibre comprehension at this time.
partymonkey is offline   Reply With Quote
Old 03-10-2011, 09:31 AM   #9
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
I think I resolved the garbage content issue. I had to put a large delay for article download and to remove latest news feed which often had incorrect links that lead to garbage content.
kiklop74 is offline   Reply With Quote
Old 03-15-2011, 06:13 PM   #10
partymonkey
Junior Member
partymonkey began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Feb 2011
Device: Kindle
Smile

Thank you for your help in improving this recipe.
partymonkey is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Recipe for La Nacion of Costa Rica amontiel69 Recipes 3 01-28-2013 11:53 AM
Patch: Ticket 9168 (Allow maximum number of copies (issues) per recipe) spedinfargo Development 3 02-25-2011 10:35 PM
Hello from Argentina Ftedin Introduce Yourself 8 08-29-2010 02:33 PM
Hi from Argentina chalten Introduce Yourself 5 12-13-2009 10:10 PM
Hi from Argentina ferfer Introduce Yourself 2 01-12-2008 11:52 PM


All times are GMT -4. The time now is 07:17 AM.


MobileRead.com is a privately owned, operated and funded community.