View Single Post
Old 11-06-2011, 11:06 AM   #3
scissors
Addict
scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.
 
Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
Quote:
Originally Posted by kovidgoyal View Post
auto_cleanup_keep will typically fail if you put it on a low level element like an <a> tag. Instead find the <div> the a is in and try keeping that.
Hi Kovid.

I tried the div above, and it's parent and both together. no good.

Also I thought that the use of the * as in
auto_cleanup_keep = '//*[@class="important"]'

meant all elements would be saved regardless of the tag it's attached to.

ALso, is

preprocess_regexps = [
(re.compile(r'Advertisement >>', re.IGNORECASE | re.DOTALL), lambda match: '')]

not deleting instances of "Advertisement >>" because auto clean up overides it?

Can you do auto clean up followed by manual for any stray elements that get through.

+++++++++++++++++

BTW the whole reason Ive gone down this path is I discovered the text/paragraph after the first image in an article is being displayed to the right of the image (in the original Daily Mirror recipe).On my prs300 it's getting "displayed" off screen. I can't find a method to insert a crlf after the image/ before the image caption.
scissors is offline   Reply With Quote