MobileRead Forums - View Single Post

josepinto · 03-28-2013, 12:32 PM

Quote:

Originally Posted by oneillpt

Replace the keep_only_tags and remove_tags lines by:

Code:

keep_only_tags = [dict(attrs={'class':['hentry article single']})]
remove_tags    = [dict(attrs={'class':['entry-options entry-options-above group','entry-options entry-options-below group', 'module tag-list']})]

This produces a big file. If you want to drop the photos for articles with photos, use the following keep_only_tags line instead (with the same remove_tags line):

Code:

keep_only_tags = [dict(attrs={'class':['entry-header single-header','entry-body']})]

Hi,

Thanks,

Text is extracted now, but sections "Desporto", "Sociedade", "Ciências" and "Ecosfera" are not downloaded. I don´t know if the feeds are the same of not, so I will search for the relevant feeds.

José Pinto