Figured out the smart-quotes thing with encoding. But now I am trying to determine how to replace actual text that is in error. In several places in the actual RSS feed there is an appearance of 'and #8216;' instead of a single quote. The preprocess_regexps command seems to replace everything between x and y with z - that is the only thing I know to make text replacements with. But I tried the following command to no avail. Is this the right command? Do I have the syntax wrong? I just want to replace the entire string, but do I say replace everything between 'and #8217' and semicolon with "'"? (the latter being a single-quote embedded in double-quotes).
preprocess_regexps = [(re.compile(r'and #8216.?;', re.DOTALL|re.IGNORECASE), lambda match: '"')]
Also - trying to convert '<STRONG>' to '<b>', but doesn't seem to work. using for a command is
preprocess_regexps = [(re.compile(r'<strong.?>', re.DOTALL|re.IGNORECASE), lambda match: '<b>')]
(also doing a similar command for the end tag.) What am I doing wrong?
Last edited by olaf; 09-26-2009 at 11:17 AM.
|