Thread: Público.pt
View Single Post
Old 03-28-2013, 11:06 AM   #2
oneillpt
Connoisseur
oneillpt began at the beginning.
 
Posts: 63
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
Quote:
Originally Posted by josepinto View Post
Hi,

Público.pt recibe does not work.
I only get the titles.

Can someone take a look?

Thanks in advance.

José Pinto
Replace the keep_only_tags and remove_tags lines by:
Code:
keep_only_tags = [dict(attrs={'class':['hentry article single']})]
remove_tags    = [dict(attrs={'class':['entry-options entry-options-above group','entry-options entry-options-below group', 'module tag-list']})]
This produces a big file. If you want to drop the photos for articles with photos, use the following keep_only_tags line instead (with the same remove_tags line):
Code:
keep_only_tags = [dict(attrs={'class':['entry-header single-header','entry-body']})]
oneillpt is offline   Reply With Quote