Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-28-2013, 07:41 AM   #1
josepinto
Connoisseur
josepinto began at the beginning.
 
Posts: 50
Karma: 10
Join Date: Apr 2005
Device: Nokia 5320
Público.pt

Hi,

Público.pt recibe does not work.
I only get the titles.

Can someone take a look?

Thanks in advance.

José Pinto
josepinto is offline   Reply With Quote
Old 03-28-2013, 11:06 AM   #2
oneillpt
Connoisseur
oneillpt began at the beginning.
 
Posts: 62
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
Quote:
Originally Posted by josepinto View Post
Hi,

Público.pt recibe does not work.
I only get the titles.

Can someone take a look?

Thanks in advance.

José Pinto
Replace the keep_only_tags and remove_tags lines by:
Code:
keep_only_tags = [dict(attrs={'class':['hentry article single']})]
remove_tags    = [dict(attrs={'class':['entry-options entry-options-above group','entry-options entry-options-below group', 'module tag-list']})]
This produces a big file. If you want to drop the photos for articles with photos, use the following keep_only_tags line instead (with the same remove_tags line):
Code:
keep_only_tags = [dict(attrs={'class':['entry-header single-header','entry-body']})]
oneillpt is offline   Reply With Quote
Advert
Old 03-28-2013, 12:32 PM   #3
josepinto
Connoisseur
josepinto began at the beginning.
 
Posts: 50
Karma: 10
Join Date: Apr 2005
Device: Nokia 5320
Quote:
Originally Posted by oneillpt View Post
Replace the keep_only_tags and remove_tags lines by:
Code:
keep_only_tags = [dict(attrs={'class':['hentry article single']})]
remove_tags    = [dict(attrs={'class':['entry-options entry-options-above group','entry-options entry-options-below group', 'module tag-list']})]
This produces a big file. If you want to drop the photos for articles with photos, use the following keep_only_tags line instead (with the same remove_tags line):
Code:
keep_only_tags = [dict(attrs={'class':['entry-header single-header','entry-body']})]
Hi,

Thanks,

Text is extracted now, but sections "Desporto", "Sociedade", "Ciências" and "Ecosfera" are not downloaded. I don´t know if the feeds are the same of not, so I will search for the relevant feeds.

José Pinto
josepinto is offline   Reply With Quote
Old 03-28-2013, 06:23 PM   #4
oneillpt
Connoisseur
oneillpt began at the beginning.
 
Posts: 62
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
Quote:
Originally Posted by josepinto View Post
Hi,

Thanks,

Text is extracted now, but sections "Desporto", "Sociedade", "Ciências" and "Ecosfera" are not downloaded. I don´t know if the feeds are the same of not, so I will search for the relevant feeds.

José Pinto
Sports may possibly require a new feed URL. For the current feed page I get:
Quote:
O Público está temporariamente indisponível
Hopefully the other three sections will also extract when the "new site" problems are resolved. The feed pages returned include:

Quote:
Novo site
Nota da Direcção: Estamos a resolver os problemas técnicos

Direcção Editorial
A equipa do PÚBLICO está a trabalhar para resolver o mais rapidamente possível os problemas técnicos que estão a afectar o novo site.
But I also notice that "Publico.pt - Geral" seems to be the only RSS feed I can find on the Publico site.
oneillpt is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Publico, 20minutos,eljueves recipes nadid Recipes 3 08-21-2011 12:00 PM


All times are GMT -4. The time now is 01:12 PM.


MobileRead.com is a privately owned, operated and funded community.