Thank you for advice, Kovid. Unfortunately this don't solve my problem.
Let's say I have this tag:
Code:
<img class="alignright size-medium wp-image-48200" width="250" height="360" src="image-48200.png" alt="title" style="width: 250px; height: 360px;"></img>
Only
alignright class has css which is
Code:
float: right;
margin: 0px 0px 5px 10px;
so my
Code:
extra_css = '.alignright {float: right; margin: 0px 0px 5px 10px;}'
It works, ebook-convert leaves
alignright class and adds my custom css. But if I have next image in this or other article with same classes but other width or height attribute its class will be renamed to
alignright{NUMBER}. I tried to remove width and height attributes by
Code:
remove_attributes = ['width', 'height']
and auto width and height to my extra_css. But if I remove these attributes calibre renames class name to sth like
calibre2 and doesn't add extra_css.
There's one more problem. It seams that calibre preserves only first class name. If class string of an element will be "
size-medium alignright wp-image-48200" instead of "
alignright size-medium wp-image-48200" then in output ebook this element won't have
alignright class as expected but
size-medium.
Here's a full recipe if you want to test it yourself or found my explanation not clear enough:
Spoiler:
Code:
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:fdm=marker:ai
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Comment
from calibre.ebooks.BeautifulSoup import BeautifulSoup
import re
class FilmOrgPl(BasicNewsRecipe):
title = u'Film.org.pl'
__author__ = 'fenuks'
description = u"Recenzje, analizy, artykuły, rankingi - wszystko o filmie dla miłośników kina. Opisy efektów specjalnych, wersji reżyserskich, remake'ów, sequeli. No i forum filmowe. Jedne z największych w Polsce."
category = 'film'
language = 'pl'
cover_url = 'http://film.org.pl/wp-content/themes/KMF/images/logo_kmf10.png'
ignore_duplicate_articles = {'title', 'url'}
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
remove_javascript = True
remove_empty_feeds = True
use_embedded_content = False
remove_attributes = ['width', 'height', 'style']
preprocess_regexps = [(re.compile(ur'<h3>Przeczytaj także:</h3>.*</body>', re.IGNORECASE|re.DOTALL), lambda m: '</body>'),]
extra_css = '.alignright {float: right; margin: 0px 0px 5px 10px;} .aligncenter {margin: 0px auto; display: block;} .alignleft {float:left; margin-right:5px;}'
keep_only_tags = [dict(attrs={'class':['content_recenzja']})]
feeds = [(u'Recenzje', u'http://film.org.pl/r/recenzje/feed/'),
#(u'Artyku\u0142', u'http://film.org.pl/a/artykul/feed/'),
#(u'Analiza', u'http://film.org.pl/a/analiza/feed/'),
#(u'Ranking', u'http://film.org.pl/a/ranking/feed/'),
#(u'Blog', u'http://film.org.pl/kmf/blog/feed/'),
#(u'Ludzie', u'http://film.org.pl/a/ludzie/feed/'),
#(u'Seriale', u'http://film.org.pl/a/seriale/feed/'),
#(u'Oceanarium', u'http://film.org.pl/a/ocenarium/feed/'),
#(u'VHS', u'http://film.org.pl/a/vhs-a/feed/')
]
def preprocess_html(self, soup):
for c in soup.findAll('h11'):
c.name = 'h1'
for c in soup.findAll('h16'):
c.name = 'h2'
for c in soup.findAll('h17'):
c.name = 'h3'
for r in soup.findAll('br'):
r.extract()
for tag in soup.findAll('h8'):
tag_index = tag.parent.contents.index(tag)
tag.parent.insert(tag_index+1, BeautifulSoup('<br></br>'))
return soup