MobileRead Forums - View Single Post

TechnoCat · 01-06-2012, 07:56 PM

I am writing a recipe for a somewhat complex website. For one set of "articles", I wish to handle the parsing myself or give it much closer attention than the others.

My recipe is generating a list of articles inside parse_index(); most of these have empty content elements and appropriate URLs. But the URLs are not to print editions (as documented here), so I'm wanting to do additional clean-up and munging, and then set the content on some of them. I have to dive into their contents anyhow to correct extract useful titles, so getting down to a relevant table or div isn't much extra effort, and should eliminate unwanted ads.

Initially I thought I could set the content element of the articles that are returned by parse_index(), but that doesn't work; it looks like it's only used for the nebulous FullContentProfile, which isn't referenced anywhere else.

I'm probably missing a pretty key concept. How can I use the parse_index() processing for most of the "feeds" and yet provide article text for some? (Alternatively, how can I know what tuple Title I'm looking at in preprocess_html() if that's really the appropriate solution... though it seems less obvious to soup it and then wait for it to be processed again.)

Thanks!

01-06-2012, 07:56 PM	#1
TechnoCat Zealot Posts: 131 Karma: 150390 Join Date: Nov 2011 Location: Pacific NorthWest Device: Kindle Fire	Setting actual content? I am writing a recipe for a somewhat complex website. For one set of "articles", I wish to handle the parsing myself or give it much closer attention than the others. My recipe is generating a list of articles inside parse_index(); most of these have empty content elements and appropriate URLs. But the URLs are not to print editions (as documented here), so I'm wanting to do additional clean-up and munging, and then set the content on some of them. I have to dive into their contents anyhow to correct extract useful titles, so getting down to a relevant table or div isn't much extra effort, and should eliminate unwanted ads. Initially I thought I could set the content element of the articles that are returned by parse_index(), but that doesn't work; it looks like it's only used for the nebulous FullContentProfile, which isn't referenced anywhere else. I'm probably missing a pretty key concept. How can I use the parse_index() processing for most of the "feeds" and yet provide article text for some? (Alternatively, how can I know what tuple Title I'm looking at in preprocess_html() if that's really the appropriate solution... though it seems less obvious to soup it and then wait for it to be processed again.) Thanks! Last edited by TechnoCat; 01-06-2012 at 08:20 PM.