MobileRead Forums - View Single Post - Recipe for Wirtschaftswoche / Wiwo.de (German Business Weekly)

hegi · 03-18-2018, 02:00 PM

Hi Divingduck,

... now this is really interesting. I retrieved the recipe with debugging info via the CLI as follows:

Code:

ebook-convert ~/.config/calibre/custom_recipes/WirtschaftsWoche\ Online_1014.recipe .mobi \
        --mobi-file-type=new --output-profile=kindle_pw --debug-pipeline calibre-debug

The original html-line on the website is like this:

Code:

<h2 class="c-headline c-headline--article u-margin-m"><span
class="c-overline c-overline--alternate u-uppercase u-letter-spacing u-margin-m c-overline--article">Wandel kostet Milliarden</span> SUV und China sollen Audi wieder nach vorne bringen
</h2>

When I now dive into the debugging data, I get in the processed directory the following code:

Code:

<h2 class="c-headline"><span class="c-overline">Wandel kostet Milliarden</span> SUV und China sollen Audi wieder nach vorne bringen
</h2>

This is interesting, as the other tags are not specified in the remove_tags statement. ... OK ...

This leads me to changing the preprocess_regexps as follows:

Code:

    preprocess_regexps    = [(re.compile(r'(<span class="hcf-location-mark">[^<]*)(</span>)', re.DOTALL|re.IGNORECASE), lambda match: match.group(1) + '. ' + match.group(2)),
                        (re.compile(r'(<span class="c-overline">[^<]*)(</span>)', re.DOTALL|re.IGNORECASE), lambda match: match.group(1) + ': ' + match.group(2))]

But unfortunately this does not change the output in the processed directory:

Code:

<h2 class="c-headline"><span class="c-overline">Wandel kostet Milliarden</span> SUV und China sollen Audi wieder nach vorne bringen
</h2>

I just don't get, why this isn't working for these tags ...

However, I'm not sure what you mean by

Quote:

Set some print statements in your recipe and pipe all statements in a log file.

This sounds like some manual logging workaround I do not understand.

Thanks again, anyway.

Hegi.