Add the preprocess_regexps option:
Code:
preprocess_regexps = [ (re.compile(r'</?a[^>]*>'),lambda match: ''),
(re.compile(r'<span[^>]*article-link-id.*?<br\s*\/?><br\s*\/?>'), lambda match: '')]
keep_only_tags = [dict(name='div', attrs={'class':'article'})]
remove_tags = [
dict(name='p',attrs={'class':'meta links'}),
dict(name='div',attrs={'class':'float-right'}),
#dict(name='span',attrs={'class':'article-link-id'})
]
feeds = [
The first one removes all <a> and </a> tags leaving the text inside, which I think is what you wanted to do with the preprocess_html function, the second ugly one removes all <span class="article-link-id">blabla</span> followed by two <br /> tags.
If you want a suggestion, you can add an extra_css option to tweak the final appearence of the article when displayed.