MobileRead Forums - View Single Post - Recipe for Wirtschaftswoche / Wiwo.de (German Business Weekly)

Divingduck · 03-20-2018, 07:39 PM

Your welcome.

I had a bit time to take a closer look at the problem.

There are two things I saw.
One is, to remember when a regex will happen. You are using preprocess_regexps. This means this refer to the downloaded HTML as source input. Therefore you can check debug\input\ as your source for the regex to find out how the downloaded HTML file looks for calibre at the moment you are manipulate the file.
Second problem is the class you are looking for include spaces in its name and that do not to work (I think that had never work).

Taking that in account, I would make it slightly different. I don't take care about the complete class string, I look only for the end of the class name for a unique identification:

... c-overline--article"> ... </span> ...

Code:

(re.compile(r'(c-overline--article">[^>]*)(</span>)', re.DOTALL|re.IGNORECASE), lambda match: match.group(1) + ': ' + match.group(2))

I attach an updated version of the recipe.