Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-29-2014, 08:01 AM   #1
cyttorak
Member
cyttorak began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Nov 2014
Device: Kobo Mini
Question Avoid extra <p>

Hi

calibre convert this code:
PHP Code:
<div>- recordarles también que mola que de vez en cuando se pase alguien de otros grupos por las reus de <span class="il">comunicación</spany que esta el correo&nbsp;<a href="mailto:comunicacioninterna@ganemosmadrid.info" target="_blank">comunicacioninterna@<wbr>ganemosmadrid.info</apara que nos pasen las <span class="il">actas</spany las necesidades de <span class="il">comunicación</spanque tengan.</div
in this:

PHP Code:
<div class="calibre6"><class="calibre9">- recordarles también que mola que de vez en cuando se pase alguien de otros grupos por las reus de </p><span>comunicación</span><class="calibre9"y que esta el correo </p><a href="mailto:comunicacioninterna@ganemosmadrid.info" target="_blank">comunicacioninterna@ganemosmadrid.info</a><class="calibre9"para que nos pasen las </p><span>actas</span><class="calibre9"y las necesidades de </p><span>comunicación</span><class="calibre9"que tengan.</p></div
I don't know why is putting that extra <p class="calibre9"> netx to the <span>

How can I avoid that?

My code:
Code:
class AdvancedUserRecipe1416065639(BasicNewsRecipe):
	title	= u'Ganemos'
	description = 'Actas de Ganemos'
	oldest_article = 365
	max_articles_per_feed = 100
	auto_cleanup = True
	reverse_article_order = True
	remove_empty_feeds = True
	language = 'es_ES'
	category = 'Rss'
	publisher = 'Ganemos'
	publication_type = 'actas'
	remove_attributes = ['class','id','name']
	feeds	= [
		(u'Feminismos', u'http://ganemosmadrid.info/category/actas/actas_feminismos/feed/')
		,(u'Programas y contenido', u'http://ganemosmadrid.info/category/actas/actas_programa/feed/')
		,(u'Candidaturas', u'http://ganemosmadrid.info/category/actas/actas_candidaturas/feed/')
		,(u'Comunicación', u'http://ganemosmadrid.info/category/actas/actas-comunicacion/feed/')
		,(u'Coordinación', u'http://ganemosmadrid.info/category/actas/actas_coordinacion/feed/')
		,(u'Herramientas y metodología', u'http://ganemosmadrid.info/category/actas/actas_herramientas/feed/')
		,(u'Movimiento municipalista', u'http://ganemosmadrid.info/category/actas/actas_movimiento/feed/')
	]
	extra_css = '.calibre_navbar {display:none;}'
	preprocess_regexps = [
		(re.compile(u'\xa0'), lambda match: ' ')
		,(re.compile(r'&nbsp;',re.DOTALL|re.IGNORECASE), lambda match: ' ')
		,(re.compile(r'\s*<p[^>]*>\s*</p>\s*',re.DOTALL|re.IGNORECASE), lambda match: '')
		,(re.compile(r'\s*<div[^>]*>\s*</div>\s*',re.DOTALL|re.IGNORECASE), lambda match: '')
	]

	conversion_options = {
		'comments' : description
		,'tags' : category
		,'language' : language
		,'publisher' : publisher
	}

	def get_cover_url(self):
		return 'http://ganemosmadrid.info/wp-content/uploads/2014/11/GM_ORG_SEPT.png'

	def parse_feeds (self):
		def parseFecha(d,m,a,f):
			if f:
				if len(f)==10:
					return f
				sf=re.split('[\/\-]',f)
				d=sf[0]
				m=sf[1]
				if len(m)==1:
					m='0'+m
				try:
					a=sf[2]
				except IndexError:
					a=None
			if len(d)==1:
				d='0'+d
			m=m.lower()
			if m=='enero':
				m='01'
			elif m=='febrero':
				m='02'
			elif m=='marzo':
				m='03'
			elif m=='abril':
				m='04'
			elif m=='mayo':
				m='05'
			elif m=='junio':
				m='06'
			elif m=='julio':
				m='07'
			elif m=='agosto':
				m='08'
			elif m=='septiembre':
				m='09'
			elif m=='octubre':
				m='10'
			elif m=='noviembre':
				m='11'
			elif m=='diciembre':
				m='12'
			if not a:
				if float(m)>5:
					a='2014'
				else:
					a='2015'
			elif len(a)==2:
				a='20'+a
			return d+'/'+m+'/'+a
		ordinal = re.compile(u'^(Acta )?(\d+)(er\.?|o\.?|a\.|º|ª) ', re.IGNORECASE|re.UNICODE)
		fecha1 = re.compile(u'.*?(\d\d?\/\d\d\/(20)?1\d).*', re.IGNORECASE|re.UNICODE)
		fecha2 = re.compile(u'.*?(\d+) (de )?(enero|febrero|marzo|abril|mayo|junio|julio|agosto|septiembre|octubre|noviembre|diciembre)( de )?(201\d)?.*', re.IGNORECASE|re.UNICODE)
		gts = re.compile(u'.*?(Grupos territoriales|Cultura).*', re.IGNORECASE|re.UNICODE)
		feeds = BasicNewsRecipe.parse_feeds(self)
		for f in feeds:
			for a,c in enumerate(f.articles):
				g=u''
				self.log('==> '+c.title)
				mOr = ordinal.match(c.title)
				mF1 = fecha1.match(c.title)
				mF2 = fecha2.match(c.title)
				mGt = gts.match(c.title)
				if mGt:
					g=mGt.group(1).lower().capitalize()
				else:
					g=f.title
				if mOr:
					g=mOr.group(2)+'º '+g
				if mF1:
					g=parseFecha(None,None,None,mF1.group(1))+' '+g
				if mF2:
					g=parseFecha(mF2.group(1),mF2.group(3),mF2.group(5),None)+' '+g
				c.title=g
				self.log('<== '+c.title+'\n')
		return feeds

Last edited by cyttorak; 11-29-2014 at 10:17 AM.
cyttorak is offline   Reply With Quote
Old 11-30-2014, 06:54 PM   #2
ireadtheinternet
Member
ireadtheinternet began at the beginning.
 
Posts: 21
Karma: 10
Join Date: Oct 2014
Device: Android
I think that is normal. I have calibre# classes inserted also in my recipes, I don't think it causes any harm, and it seems to be needed for some internal processing.
ireadtheinternet is offline   Reply With Quote
Advert
Old 11-30-2014, 10:47 PM   #3
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
It is because calibre flattens all CSS, in order to ensure it works as best as possible across all devices.

The end result is that it looks the way it was supposed to, on the ereader screen, and looks like highly-confusing garbage in the internals -- on the theory that conversions are not usually meant to be edited.

You can use ebook-convert via the command-line and pass an output name without an extension to get it to write the un-flattened OEB directory to that location. Not really sure who would use that, actually... If you want to do anything with the HTML, do it before calibre converts it. And then it won't matter afterward, what calibre does to it.
eschwartz is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Getting Full Content from Partial Content Feeds thread314 Calibre 5 05-05-2012 10:49 AM
search and replace - drops blanks in replace ? cybmole Conversion 10 03-13-2011 03:07 AM
Indianapolis Public Schools Replace Textbooks with Digital Content (THE Journal) Nate the great News 1 01-15-2010 08:18 PM


All times are GMT -4. The time now is 08:23 PM.


MobileRead.com is a privately owned, operated and funded community.