Quote:
Originally Posted by kovidgoyal
auto_cleanup_keep will typically fail if you put it on a low level element like an <a> tag. Instead find the <div> the a is in and try keeping that.
|
Hi Kovid.
I tried the div above, and it's parent and both together. no good.
Also I thought that the use of the * as in
auto_cleanup_keep = '//*[@class="important"]'
meant all elements would be saved regardless of the tag it's attached to.
ALso, is
preprocess_regexps = [
(re.compile(r'Advertisement >>', re.IGNORECASE | re.DOTALL), lambda match: '')]
not deleting instances of "Advertisement >>" because auto clean up overides it?
Can you do auto clean up followed by manual for any stray elements that get through.
+++++++++++++++++
BTW the whole reason Ive gone down this path is I discovered the text/paragraph after the first image in an article is being displayed to the right of the image (in the original Daily Mirror recipe).On my prs300 it's getting "displayed" off screen. I can't find a method to insert a crlf after the image/ before the image caption.