View Single Post
Old 01-06-2012, 07:56 PM   #1
TechnoCat
Zealot
TechnoCat gives new meaning to the word 'superlative.'TechnoCat gives new meaning to the word 'superlative.'TechnoCat gives new meaning to the word 'superlative.'TechnoCat gives new meaning to the word 'superlative.'TechnoCat gives new meaning to the word 'superlative.'TechnoCat gives new meaning to the word 'superlative.'TechnoCat gives new meaning to the word 'superlative.'TechnoCat gives new meaning to the word 'superlative.'TechnoCat gives new meaning to the word 'superlative.'TechnoCat gives new meaning to the word 'superlative.'TechnoCat gives new meaning to the word 'superlative.'
 
Posts: 131
Karma: 150390
Join Date: Nov 2011
Location: Pacific NorthWest
Device: Kindle Fire
Setting actual content?

I am writing a recipe for a somewhat complex website. For one set of "articles", I wish to handle the parsing myself or give it much closer attention than the others.

My recipe is generating a list of articles inside parse_index(); most of these have empty content elements and appropriate URLs. But the URLs are not to print editions (as documented here), so I'm wanting to do additional clean-up and munging, and then set the content on some of them. I have to dive into their contents anyhow to correct extract useful titles, so getting down to a relevant table or div isn't much extra effort, and should eliminate unwanted ads.

Initially I thought I could set the content element of the articles that are returned by parse_index(), but that doesn't work; it looks like it's only used for the nebulous FullContentProfile, which isn't referenced anywhere else.

I'm probably missing a pretty key concept. How can I use the parse_index() processing for most of the "feeds" and yet provide article text for some? (Alternatively, how can I know what tuple Title I'm looking at in preprocess_html() if that's really the appropriate solution... though it seems less obvious to soup it and then wait for it to be processed again.)

Thanks!

Last edited by TechnoCat; 01-06-2012 at 08:20 PM.
TechnoCat is offline   Reply With Quote