BasicNewsRecipe tag handling
I'm trying to enhance the Globe and Mail news recipe to omit a lot of link fluff at the end of each article. I'm not getting the results I expect from remove_tags, remove_tags_after, remove_tags_before and keep_only_tags. I suspect it has something to do with semantics and processing order. For example, does remove_tags_after mean every tag after the tag specified that is a sibling, or every tag in the current url? When is keep_only_tags processed and how does it interact with remove_tags (e.g. if I keep tag X which includes tag Y specified in remove_tags, will Y be removed, and conversely if I remove tag Y which contains tag X to be kept, what is the result).
I've been looking at the code but tracing this back is turning into an arduous task. Can anyone enlighten me?
|