MobileRead Forums - View Single Post - Recipe for Wirtschaftswoche / Wiwo.de (German Business Weekly)

hegi · 05-18-2013, 07:11 AM

Hey Folks,

I seem to be getting nowhere with my limited tries with preprocess_html. The results are strange and I'm having my difficulties to get to grips with the beatiful soup documentation.

Nevertheless, can't I do the trick possibly more easily with preprocess_regexps?

My current status is as follows:

Code:

preprocess_regexps    = [(re.compile(r'(<span class="hcf-location-mark">.+) (</span>)', re.DOTALL|re.IGNORECASE), lambda match: "\1'. '\2")]

But as a result I don't see any change in the output. Could it be, that the braketing of the RegExp Parts and the referencing with \1 or \2 does not work in this case?

I found some useful expamples for preprocess_regexps here, however I havn't found a way documented to include the match form the search in the replace part.

Many thanks in advance for any useful hints in this matter.

Hegi.

05-18-2013, 07:11 AM	#11
hegi Enthusiast Posts: 44 Karma: 10 Join Date: Dec 2012 Device: Kindle 4 & Kindle PW 3G	Hey Folks, I seem to be getting nowhere with my limited tries with preprocess_html. The results are strange and I'm having my difficulties to get to grips with the beatiful soup documentation. Nevertheless, can't I do the trick possibly more easily with preprocess_regexps? My current status is as follows: Code: preprocess_regexps = [(re.compile(r'(<span class="hcf-location-mark">.+) (</span>)', re.DOTALL\|re.IGNORECASE), lambda match: "\1'. '\2")] But as a result I don't see any change in the output. Could it be, that the braketing of the RegExp Parts and the referencing with \1 or \2 does not work in this case? I found some useful expamples for preprocess_regexps here, however I havn't found a way documented to include the match form the search in the replace part. Many thanks in advance for any useful hints in this matter. Hegi.