Hi All,
I'm developing a new recipe for a subsription required hungarian website, and I'm in an almost final stage (generated feed from the index, fetching articles is OK).
I'm using
auto_cleanup = True to create readable articles which work rather well and I'm happy with the output.
My only remaining issue is, that although I had set up some regex based removal like this:
Spoiler:
Code:
preprocess_regexps = [ (re.compile(r'<!--.*?-->', re.DOTALL), lambda m: ''),
(re.compile(r'<p align="left"'), lambda m: '<p'),
(re.compile(r'<a href="/"><img src="images/logo.jpg".*?/></a>'), lambda m: ''),
(re.compile(r'<a href="/"><img src="images/logo.jpg".*?/></a>'), lambda m: ''),
(re.compile(r'<a href="javascript:changeFontSize.*?/></a>', re.DOTALL), lambda m: ''),
(re.compile(r'\| ÉLET ÉS IRODALOM</title>'), lambda m: '</title>')
]
It looks like it does not replaces (especially the last line) anything and I don't know why.
It's important as I had noticed the articles title cames from the page's <title> tags. And for some reason the original <title> tags on the article's page contains that unnecessary uppercase text (with a | in front of it). Can someone give me a hint how to remove that?