Hi Divingduck,
... now this is really interesting. I retrieved the recipe with debugging info via the CLI as follows:
Code:
ebook-convert ~/.config/calibre/custom_recipes/WirtschaftsWoche\ Online_1014.recipe .mobi \
--mobi-file-type=new --output-profile=kindle_pw --debug-pipeline calibre-debug
The original html-line on the website is like this:
Code:
<h2 class="c-headline c-headline--article u-margin-m"><span
class="c-overline c-overline--alternate u-uppercase u-letter-spacing u-margin-m c-overline--article">Wandel kostet Milliarden</span> SUV und China sollen Audi wieder nach vorne bringen
</h2>
When I now dive into the debugging data, I get in the processed directory the following code:
Code:
<h2 class="c-headline"><span class="c-overline">Wandel kostet Milliarden</span> SUV und China sollen Audi wieder nach vorne bringen
</h2>
This is interesting, as the other tags are not specified in the remove_tags statement. ... OK ...
This leads me to changing the preprocess_regexps as follows:
Code:
preprocess_regexps = [(re.compile(r'(<span class="hcf-location-mark">[^<]*)(</span>)', re.DOTALL|re.IGNORECASE), lambda match: match.group(1) + '. ' + match.group(2)),
(re.compile(r'(<span class="c-overline">[^<]*)(</span>)', re.DOTALL|re.IGNORECASE), lambda match: match.group(1) + ': ' + match.group(2))]
But unfortunately this does not change the output in the processed directory:
Code:
<h2 class="c-headline"><span class="c-overline">Wandel kostet Milliarden</span> SUV und China sollen Audi wieder nach vorne bringen
</h2>
I just don't get, why this isn't working for these tags ...
However, I'm not sure what you mean by
Quote:
Set some print statements in your recipe and pipe all statements in a log file.
|
This sounds like some manual logging workaround I do not understand.
Thanks again, anyway.
Hegi.