MobileRead Forums - View Single Post - using auto_cleanup and manual clean up together

scissors · 11-06-2011, 10:36 AM

After weeks of tinkering withe the Daily Mirror recipe, I went back to the start and found auto_cleanup was doing a really good job - with a couple of exceptions

1) The articles by and date text are erased after the headline.
2)The text "Advertisement >>" is left intact.

The article source for the date is

Spoiler:

so I thought using

auto_cleanup_keep = '//a[@class="published"]'

or

auto_cleanup_keep = '//*[@class="published"]'

would mean the date got left in - it wasn't.

I also tried

preprocess_regexps = [
(re.compile(r'Advertisement >>', re.IGNORECASE | re.DOTALL), lambda match: '')]

to just delete "Advertisement >>" so even if a class was created by calibre it would be empty. Again no success.

Is the call being ignored because autocleanup is being used?

It would be nice to fix this as the file created is smaller than my butchery and seems formatted in a cleaner way.

Here's the simplified code as it stands

Spoiler: