![]() |
Recipe for Wirtschaftswoche / Wiwo.de (German Business Weekly)
HiHo,
took the time to build a recipe for German Wirtschaftswoche based on Malfi's Handelsblatt recipe. - It's already very usable, though I still have two things I'd like to optimize. I hope you guys can help. Let's start with the Recipe "as is" first: Code:
##1. When an article starts with a "place", the source html looks as follows: Code:
<span class="hcf-location-mark">New York</span>2. The end of the article text looks in html like this: Code:
[...]<div id="hcf-footer"><div class="hcf-copyright">Thanks a lot for your help. - And hope the recipe is useful for others, too. Hegi. |
You need to update your version of calibre first. Then just add the following extra css to the recipe
Code:
extra_css = '''As for the second, there is likely something you are missing, all the best tracking it down :) |
Pseudo CSS :after
Hi Kovid,
thanks for you quick reply. My life is a bit crazy these days, so it took me longer to get back to you. AND - I tried quite a few things in the meantime. Nevertheless I'm still hanging with the :after CSS tag. Currently my extra_css looks like this: Code:
extra_css = 'h1 {font-size: 1.6em; text-align: left} \It also says in the changelog, that as of 0.9.24 it is possible to "reduce the size of downloaded images by lowering their quality". I assume this refers to the options "compress_news_images_max_size" and "compress_news_images_auto_size". - But it doesn't appear to have a significant effect. Very strange! I'm running calibre in an ia32 chroot on an debian amd64 system. But all seems fine: Code:
[$ calibre --versionLast question: When the recipe is running satisfactory, then it's here the place to post the final version, correct? Thanks a lot! Hegi. |
Kovid,
...me again. This is *really* strange: Why do I get completely different behaviour / output when I run the recipe from the cli with ebook-convert than when I run it from calibre gui with "download now"? On the cli things work much neater than form the gui. (E.g. from cli the css works with :after tag, publisher tag is used - instead saying just "calibre"). - This is weired. I think, I'm just going bananas. Hegi. |
Presumably because you are running different versions of calibre.
|
Hi Kovid,
... so, did a complete clean new install of 0.9.27 using the official binaries (amd64) from your website and the python installer, uninstalled the version in the chroot and now there should be a clean an actual calibre environment. What I notice is the following: - whether the :after CSS is working or not depends on the selected output format. In the gui options I have ".mobi" as preferred output format (in order to email that automatically to my kindle pw). Previously I made an .epub form the cli. Now I changed that to ".mobi" as well. RESULT: If the Output format is .mobi, the :after CSS does not work, if it is .epub it does. - Could this possibly be a buggy behaviour? - The other differences in output seem to be related to to format as well. - When creating .epub I get a Header (Menu buttons) and Footer ("downloaded by calibre ...", Menu buttons). So the real issue seems to be, why CSS :after does not work with .mobi format. I would be delighted, if this hint helps to discover a bug. Thanks Hegi. |
The MOBI format has no support for CSS. You must use either epub or azw3, but not that amazon does not support periodicals in the azw3 format.
|
@hegi, if you are develpoing a general recipe for a wide range of readers you need to be carefull with predefined formats. Use as less as possible. You will find these differences between devices and formats.
|
@Divingduck: Stay cool and calm. - I work from two ends: Firstly, I want to make the recipe work with general options. Secondly, I want to optimize for my own device. The options for the latter bit can then be commented an whoever likes them, can switch them back on. - All will be well!
However, the deeper I dig into this, the more complicated things seem to get. And I'm really busy these days, so things progress *very* slowly. Hegi. |
preprocess_html instead of extra_css
Hiho,
... still optimizing ... and still going a bit crazy, since I have only superficial programming skills:blink:. Now, if the clever CSS hint from Kovid won't work for .mobi format, I ask myself, if I could not achieve the same using preprocess_html. What I get as input form the webiste is: Code:
<span class="hcf-location-mark">Place</span>Code:
def preprocess_html(self, soup):Thanks. Hegi. |
Hey Folks,
I seem to be getting nowhere with my limited tries with preprocess_html. The results are strange and I'm having my difficulties to get to grips with the beatiful soup documentation. Nevertheless, can't I do the trick possibly more easily with preprocess_regexps? My current status is as follows: Code:
preprocess_regexps = [(re.compile(r'(<span class="hcf-location-mark">.+) (</span>)', re.DOTALL|re.IGNORECASE), lambda match: "\1'. '\2")]I found some useful expamples for preprocess_regexps here, however I havn't found a way documented to include the match form the search in the replace part. Many thanks in advance for any useful hints in this matter. Hegi. |
preprocess_regexps -- use of variables in the replace string
:thumbsup: ... me again!
finally got it working. Here the Regex code, that does the trick: Code:
preprocess_regexps = [(re.compile(r'(<span class="hcf-location-mark">[^<]*)(</span>)', re.DOTALL|re.IGNORECASE), lambda match: match.group(1) + '. ' + match.group(2))]Hegi. |
WirtschaftsWoche Online - working recipe optimized
Hi Folks,
... after a couple of weeks fiddling about, here my "production quality" recipe for WirtschaftsWoche Online. - Enjoy :book2:. The template I began with is from Divingduck and I got his clearance for posting my modified version here: Code:
__license__ = 'GPL v3'Thanks to all who helped me getting there:thumbsup:! Hegi. |
|
Thanks Kovid,
that was really quick! Hegi. |
| All times are GMT -4. The time now is 09:05 PM. |
Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.