How detelete empty paragraph?

cyttorak · 11-26-2014, 02:34 PM

Hi

I'm writing this recipe:

Code:

class AdvancedUserRecipe1416065639(BasicNewsRecipe):
	title	= u'Ganemos Feminismos'
	oldest_article = 365
	max_articles_per_feed = 100
	auto_cleanup = True
	reverse_article_order = True
	remove_empty_feeds = True
	language = 'es_ES'
	publisher = 'Ganemos'
	publication_type = 'actas'
	feeds	= [(u'Feminismos', u'http://ganemosmadrid.info/category/actas/actas_feminismos/feed/')]
	extra_css = '.calibre_navbar, *:empty {display:none;}'
	preprocess_regexps = [
		(re.compile(r'&nbsp;',re.DOTALL|re.IGNORECASE), lambda match: ''),
		(re.compile(r'\s*<p[^>]*>\s*</p>\s*',re.DOTALL|re.IGNORECASE), lambda match: '')
	]

	def get_cover_url(self):
		return 'http://ganemosmadrid.info/wp-content/uploads/2014/11/GM_ORG_SEPT.png'

but I'm still see empty paragraph in my .epub. I see the line blank for each:

Quote:

I get:

Code:

<p class="calibre8"> </p>

how can I delete this kind of empty tags?

Thanks

kovidgoyal · 11-26-2014, 11:09 PM

You need to look at the actual source html of the articles in question, not the html in the final book. Visiting one of the articles in that feed, I see no   in the article html. There will be something in that markup that is getting mapped to empty paragreaphs by auto_cleanup. You will have to figure out what that is. Or dont use auto_cleanup and instead use keep_only_tags/remove_tags

cyttorak · 11-27-2014, 02:41 AM

Thank kovidgoyal

but the solution was this:

Quote:

preprocess_regexps = [
(re.compile(u'\xa0'), lambda match: ' '),
(re.compile(r' ',re.DOTALL|re.IGNORECASE), lambda match: ' '),
(re.compile(r'\s*<p[^>]*>\s*\s*',re.DOTALL|re.IGNORECASE), lambda match: '')
]

I saw it here http://stackoverflow.com/questions/1...a0-from-string

11-26-2014, 11:09 PM	#2
kovidgoyal creator of calibre Posts: 43,860 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	You need to look at the actual source html of the articles in question, not the html in the final book. Visiting one of the articles in that feed, I see no <p> </p> in the article html. There will be something in that markup that is getting mapped to empty paragreaphs by auto_cleanup. You will have to figure out what that is. Or dont use auto_cleanup and instead use keep_only_tags/remove_tags

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Problem: Merge two ebooks paragraph by paragraph...	akayacik80	Workshop	5	09-23-2014 09:05 AM
Writing on empty	gmw	Writers' Corner	27	12-21-2013 05:09 PM
Spine is empty?	artbatista	Conversion	44	07-01-2012 02:37 PM
Preference: Paragraph indent or a little paragraph spacing?	1611mac	General Discussions	48	11-11-2011 12:43 AM
Empty Books	philandjan	Library Management	8	03-11-2011 06:03 PM

Advert