Thank kovidgoyal
but the solution was this:
Quote:
preprocess_regexps = [
(re.compile(u'\xa0'), lambda match: ' '),
(re.compile(r' ',re.DOTALL|re.IGNORECASE), lambda match: ' '),
(re.compile(r'\s*<p[^>]*>\s*</p>\s*',re.DOTALL|re.IGNORECASE), lambda match: '')
]
|
I saw it here
http://stackoverflow.com/questions/1...a0-from-string