Hey Folks,
I seem to be getting nowhere with my limited tries with preprocess_html. The results are strange and I'm having my difficulties to get to grips with the beatiful soup documentation.
Nevertheless, can't I do the trick possibly more easily with preprocess_regexps?
My current status is as follows:
Code:
preprocess_regexps = [(re.compile(r'(<span class="hcf-location-mark">.+) (</span>)', re.DOTALL|re.IGNORECASE), lambda match: "\1'. '\2")]
But as a result I don't see any change in the output. Could it be, that the braketing of the RegExp Parts and the referencing with \1 or \2 does not work in this case?
I found some useful expamples for preprocess_regexps
here, however I havn't found a way documented to include the match form the search in the replace part.
Many thanks in advance for any useful hints in this matter.
Hegi.