![]() |
#1 |
Member
![]() Posts: 18
Karma: 10
Join Date: Nov 2014
Device: Kobo Mini
|
![]()
Hi
calibre convert this code: PHP Code:
PHP Code:
How can I avoid that? My code: Code:
class AdvancedUserRecipe1416065639(BasicNewsRecipe): title = u'Ganemos' description = 'Actas de Ganemos' oldest_article = 365 max_articles_per_feed = 100 auto_cleanup = True reverse_article_order = True remove_empty_feeds = True language = 'es_ES' category = 'Rss' publisher = 'Ganemos' publication_type = 'actas' remove_attributes = ['class','id','name'] feeds = [ (u'Feminismos', u'http://ganemosmadrid.info/category/actas/actas_feminismos/feed/') ,(u'Programas y contenido', u'http://ganemosmadrid.info/category/actas/actas_programa/feed/') ,(u'Candidaturas', u'http://ganemosmadrid.info/category/actas/actas_candidaturas/feed/') ,(u'Comunicación', u'http://ganemosmadrid.info/category/actas/actas-comunicacion/feed/') ,(u'Coordinación', u'http://ganemosmadrid.info/category/actas/actas_coordinacion/feed/') ,(u'Herramientas y metodología', u'http://ganemosmadrid.info/category/actas/actas_herramientas/feed/') ,(u'Movimiento municipalista', u'http://ganemosmadrid.info/category/actas/actas_movimiento/feed/') ] extra_css = '.calibre_navbar {display:none;}' preprocess_regexps = [ (re.compile(u'\xa0'), lambda match: ' ') ,(re.compile(r' ',re.DOTALL|re.IGNORECASE), lambda match: ' ') ,(re.compile(r'\s*<p[^>]*>\s*</p>\s*',re.DOTALL|re.IGNORECASE), lambda match: '') ,(re.compile(r'\s*<div[^>]*>\s*</div>\s*',re.DOTALL|re.IGNORECASE), lambda match: '') ] conversion_options = { 'comments' : description ,'tags' : category ,'language' : language ,'publisher' : publisher } def get_cover_url(self): return 'http://ganemosmadrid.info/wp-content/uploads/2014/11/GM_ORG_SEPT.png' def parse_feeds (self): def parseFecha(d,m,a,f): if f: if len(f)==10: return f sf=re.split('[\/\-]',f) d=sf[0] m=sf[1] if len(m)==1: m='0'+m try: a=sf[2] except IndexError: a=None if len(d)==1: d='0'+d m=m.lower() if m=='enero': m='01' elif m=='febrero': m='02' elif m=='marzo': m='03' elif m=='abril': m='04' elif m=='mayo': m='05' elif m=='junio': m='06' elif m=='julio': m='07' elif m=='agosto': m='08' elif m=='septiembre': m='09' elif m=='octubre': m='10' elif m=='noviembre': m='11' elif m=='diciembre': m='12' if not a: if float(m)>5: a='2014' else: a='2015' elif len(a)==2: a='20'+a return d+'/'+m+'/'+a ordinal = re.compile(u'^(Acta )?(\d+)(er\.?|o\.?|a\.|º|ª) ', re.IGNORECASE|re.UNICODE) fecha1 = re.compile(u'.*?(\d\d?\/\d\d\/(20)?1\d).*', re.IGNORECASE|re.UNICODE) fecha2 = re.compile(u'.*?(\d+) (de )?(enero|febrero|marzo|abril|mayo|junio|julio|agosto|septiembre|octubre|noviembre|diciembre)( de )?(201\d)?.*', re.IGNORECASE|re.UNICODE) gts = re.compile(u'.*?(Grupos territoriales|Cultura).*', re.IGNORECASE|re.UNICODE) feeds = BasicNewsRecipe.parse_feeds(self) for f in feeds: for a,c in enumerate(f.articles): g=u'' self.log('==> '+c.title) mOr = ordinal.match(c.title) mF1 = fecha1.match(c.title) mF2 = fecha2.match(c.title) mGt = gts.match(c.title) if mGt: g=mGt.group(1).lower().capitalize() else: g=f.title if mOr: g=mOr.group(2)+'º '+g if mF1: g=parseFecha(None,None,None,mF1.group(1))+' '+g if mF2: g=parseFecha(mF2.group(1),mF2.group(3),mF2.group(5),None)+' '+g c.title=g self.log('<== '+c.title+'\n') return feeds Last edited by cyttorak; 11-29-2014 at 10:17 AM. |
![]() |
![]() |
![]() |
#2 |
Member
![]() Posts: 21
Karma: 10
Join Date: Oct 2014
Device: Android
|
I think that is normal. I have calibre# classes inserted also in my recipes, I don't think it causes any harm, and it seems to be needed for some internal processing.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
It is because calibre flattens all CSS, in order to ensure it works as best as possible across all devices.
The end result is that it looks the way it was supposed to, on the ereader screen, and looks like highly-confusing garbage in the internals -- on the theory that conversions are not usually meant to be edited. You can use ebook-convert via the command-line and pass an output name without an extension to get it to write the un-flattened OEB directory to that location. Not really sure who would use that, actually... If you want to do anything with the HTML, do it before calibre converts it. And then it won't matter afterward, what calibre does to it. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Getting Full Content from Partial Content Feeds | thread314 | Calibre | 5 | 05-05-2012 10:49 AM |
search and replace - drops blanks in replace ? | cybmole | Conversion | 10 | 03-13-2011 03:07 AM |
Indianapolis Public Schools Replace Textbooks with Digital Content (THE Journal) | Nate the great | News | 1 | 01-15-2010 08:18 PM |