Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-29-2011, 05:37 AM   #1
Tito HX
Junior Member
Tito HX began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Mar 2011
Device: Kindle v3
Recipe for Periodismo Humano

Hi, people

I made a recipe for Periodismo Humano, a Spanish online newspaper focused in human rights

Here there is the source code

Code:
import re
from calibre.web.feeds.news import BasicNewsRecipe

class PeriodismoHumanoRecipe(BasicNewsRecipe):

	title = u'Periodismo Humano'
	__author__ = 'Herrero, L.'
	description = 'Periodismo Humano: información que sí importa'
	oldest_article = 7
	max_articles_per_feed = 100
	no_stylesheets = True
	encoding = 'utf-8'
	publisher = 'Periodismo Humano'
	category = 'news, Spain, world'
	language = 'es_ES'
	publication_type = 'newsportal'
	remove_empty_feeds = True
	remove_javascript = True
	use_embedded_content = False # Luego ya borramos lo que sea

	remove_attributes = ['height','width']

	extra_css = """
		.wp-caption-text{font-family: sans-serif; font-style:italic; font-size:80%; text-align: justify; }
		"""

	# Vamos a mantener su estilo por ahora
	#extra_css = """
		# p{text-align: justify; font-size: 100%}
		# body{ text-align: justify; font-size:100% }
		# h5{font-family: sans-serif; font-size:100%; font-weight:bold; text-align: left; }
		# h4{font-family: sans-serif; font-size:110%; font-weight:bold; text-align: left; }
		# h3{font-family: sans-serif; font-size:120%; font-weight:bold; text-align: left; }
		# h2{font-family: sans-serif; font-size:150%; font-weight:bold; text-align: left; }
		# h1{font-family: sans-serif; font-size:150%; font-weight:bold; text-align: left; }
		# img {float:left; padding:5px 20px 10px 5px;}
		# .author-date-head{font-family: sans-serif; font-size:120%; text-align: left; }
		# .subtitle{font-family: sans-serif; font-size:120%; font-weight:bold; text-align: justify; }
	#	.wp-caption-text{font-family: sans-serif; font-style:italic; font-size:80%; text-align: justify; }	#Comentarios de las fotos más pequeños y en cursivas
	#	"""

	conversion_options = {
		'comments' : description,
		'tags' : category,
		'language' : language,
		'publisher' : publisher
		}

	def get_cover_url(self):
		return 'http://periodismohumano.com/files/2010/01/logoconslogan-300x211.jpg'

	# Nos quedamos sólo con el div content y sus hijos
	keep_only_tags = [
		dict(name='div', attrs={'id':'content'})
		]

	# De los hijos nos sobran algunas cosas
	remove_tags = [
		# La categoría la removeremos con expresiones regulares
		dict(name='ul', attrs={'id':'share-services-head'}),	# Share de arriba
		dict(name='div', attrs={'id':'share'}), 				# Share de abajo
		dict(name='div', attrs={'class':'clearfix share'}), 	# Otro hare de abajo
		dict(name='div', attrs={'id':'comments-title'}), 		# Comentarios
		dict(name='div', attrs={'class':'navigation_comment clearfix'}),
		dict(name='ol', attrs={'class':'commentlist'}),
		dict(name='div', attrs={'id':'respond'}),
		dict(name='div', attrs={'id':'sidebar'}),
		#comentarios
		dict(name='div', attrs={'class':'comments'}),
		# Títulos de las secciones
		dict(name='h1', attrs={'id':'header-logo'})
	]

	# Feeds extraidos de la barra de secciones, http://periodismohumano.com/el-equipo y http://periodismohumano.com/seccion/enfoques
	feeds = [
		#Secciones
		(u'Sociedad', u'http://periodismohumano.com/seccion/sociedad/feed/atom'),
		(u'Economía', u'http://periodismohumano.com/seccion/economia/feed/atom'),
		(u'Migración', u'http://periodismohumano.com/seccion/migracion/feed/atom'),
		(u'Mujer', u'http://periodismohumano.com/seccion/mujer/feed/atom'),
		(u'En conflicto', u'http://periodismohumano.com/seccion/en-conflicto/feed/atom'),
		(u'Culturas', u'http://periodismohumano.com/seccion/culturas/feed/atom'),
		(u'Cooperación', u'http://periodismohumano.com/seccion/cooperacion/feed/atom'),
		# Enfoques
		(u'P+HD (Redacción)',u'http://pmasdh.periodismohumano.com/feed/atom'),
		(u'Alianzas (Leila Nachawati)', u'http://alianzas.periodismohumano.com/feed/atom'),
		(u'Con Papeles (Javier Galparsoso)', u'http://conpapeles.periodismohumano.com/feed/atom'),
		(u'Consume y Muere (Xóse A. López)', u'http://consumeymuere.periodismohumano.com/feed/atom'),
		(u'El Gran Juego (Carlos Sardiña)', u'http://elgranjuego.periodismohumano.com/feed/atom'),
		(u'El Minotauro Anda Suelto (Olga Rodríguez)', u'http://minotauro.periodismohumano.com/feed/atom'),
		(u'Fronteras (Juan José Téllez)', u'http://tellez.periodismohumano.com/feed/atom'),
		(u'Inquietudes Bárbaras (Luís García Montero)', u'http://garciamontero.periodismohumano.com/feed/atom'),
		(u'La Madeja de Ariana (Ariadna Alvarado)', u'http://ariana.periodismohumano.com/feed/atom'),
		(u'Pandoras Invisibles (Helena Maleno)', u'http://pandoras.periodismohumano.com/feed/atom'),
		(u'Ruedas de Molino (Luis Acebal)',u'http://ruedasdemolino.periodismohumano.com/feed/atom'),
		(u'Toma la Palabra', u'http://tomalapalabra.periodismohumano.com/feed/atom'),
		(u'Tras la Política (David Martos)', u'http://traslapolitica.periodismohumano.com/feed/atom'),
		# Sociedad Civil
		(u'Amnistía Internacional', u'http://amnistiainternacional.periodismohumano.com/feed/atom'),
		(u'Greenpeace', u'http://greenpeace.periodismohumano.com/feed/atom'),
		(u'Médicos sin Fronteras', u'http://msf.periodismohumano.com/feed/atom'),
		# Periodismo Ciudadano
		(u'Bottup', u'http://bottup.periodismohumano.com/feed/atom')
	]

	preprocess_regexps = [
		# Eliminamos el título de la categoría
		(re.compile(r'<h2 class="category cat-title.*?</h2>', re.DOTALL|re.IGNORECASE), lambda m: ''),
		# Indicamos que antes había un vídeo (que en un ebook quizás no se vea)
		(re.compile(r'<param name="movie"', re.DOTALL|re.IGNORECASE), lambda match: '<small>[Video] </small><param name="movie"')
	]
Now a couple of questions. The license of the web page is Creative Commons, and I'd like to show it on the final document. Is there any way, o do you suggest any alternative?

Also, I am trying to commit the recipe to the Calibre main development as said here, but it seems like they changed the bug track system. How can I send the recipe now?

If somebody has any suggestion, feel free to say it (simple ones, please, I am a newbie!)

Cheers

Tito HX
Tito HX is offline   Reply With Quote
Old 03-29-2011, 07:57 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Tito HX View Post
The license of the web page is Creative Commons, and I'd like to show it on the final document. Is there any way, o do you suggest any alternative?
You can add it to the description to show it before the recipe is used. To add content, use BeautifulSoup and Tag to add a tag or you can modify an existing tag in preprocess_html
Starson17 is offline   Reply With Quote
Advert
Reply

Tags
periodismo humano, recipe


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Recipe works when mocked up as Python file, fails when converted to Recipe ode Recipes 7 09-04-2011 04:57 AM
new recipe marbs Recipes 0 11-24-2010 04:59 AM
Recipe help kool Recipes 3 10-22-2010 03:34 PM
New recipe kiklop74 Recipes 0 10-05-2010 04:41 PM
New recipe kiklop74 Recipes 0 10-01-2010 02:42 PM


All times are GMT -4. The time now is 04:27 AM.


MobileRead.com is a privately owned, operated and funded community.