I am writing a recipe for a somewhat complex website. For one set of "articles", I wish to handle the parsing myself or give it much closer attention than the others.
My recipe is generating a list of articles inside parse_index(); most of these have empty content elements and appropriate URLs. But the URLs are not to print editions (
as documented here), so I'm wanting to do additional clean-up and munging, and then set the content on
some of them. I have to dive into their contents anyhow to correct extract useful titles, so getting down to a relevant table or div isn't much extra effort, and should eliminate unwanted ads.
Initially I thought I could set the content element of the articles that are returned by parse_index(), but that doesn't work; it looks like it's only used for the nebulous FullContentProfile, which isn't referenced anywhere else.
I'm probably missing a pretty key concept. How can I use the parse_index() processing for most of the "feeds" and yet provide article text for some? (Alternatively, how can I know what tuple Title I'm looking at in preprocess_html() if that's really the appropriate solution... though it seems less obvious to soup it and then wait for it to be processed again.)
Thanks!