![]() |
#1 |
Member
![]() Posts: 15
Karma: 14
Join Date: Jun 2011
Location: Brazil
Device: Kindle
|
Fetching taking too much time
Hi buddies!
I've made a recipe that is taking too much time to process (about 30 min.) May you take a look at it? Below is the recipe and attached the execution log. Code:
class PortalR7(BasicNewsRecipe): title = 'Noticias R7' __author__ = 'Diniz Bortolotto' description = 'Noticias Portal R7' oldest_article = 2 max_articles_per_feed = 20 encoding = 'utf8' publisher = 'Rede Record' category = 'news, Brazil' language = 'pt_BR' publication_type = 'newsportal' feeds = [ (u'Brasil', u'http://www.r7.com/data/rss/brasil.xml'), (u'Economia', u'http://www.r7.com/data/rss/economia.xml'), (u'Internacional', u'http://www.r7.com/data/rss/internacional.xml'), (u'Tecnologia e Ci\xeancia', u'http://www.r7.com/data/rss/tecnologiaCiencia.xml') ] reverse_article_order = True remove_tags = [ dict(name='ul', attrs={'class':'controles'}), dict(name='div', attrs={'class':'materia_banner'}), dict(name='ul', attrs={'class':'relacionados'}) ] keep_only_tags = [ dict(name='div', attrs={'class':'materia'}) ] |
![]() |
![]() |
![]() |
#2 | |
Connoisseur
![]() Posts: 58
Karma: 12
Join Date: May 2011
Location: Deland, Florida
Device: Kindle 3
|
Quote:
See if the attached recipe is of any benefit and let me know. If you have further questions about your recipe, other than time it takes to download, submit another post regarding those issues. |
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Connoisseur
![]() Posts: 58
Karma: 12
Join Date: May 2011
Location: Deland, Florida
Device: Kindle 3
|
Updated Recipe
I decided to work a little further and assist in removing some of the extraneous tags. There are still many more but, as I said, I do not read Spanish so I can not tell what is relevant and what is not.
|
![]() |
![]() |
![]() |
#4 |
Member
![]() Posts: 15
Karma: 14
Join Date: Jun 2011
Location: Brazil
Device: Kindle
|
Thank you! And my recipe version.
Hi BRGriff!!
First of all, I want to say a big "Thank you!!". ![]() Considering your first reply, I made a new recipe (below). Now, that new version takes around 4 minutes to fetch and create MOBI output. That is really better than first version. So, I believe this new recipe can be usefull for all friends that are able to read in Brazilian Portuguese (not Spanish ![]() The RSS source is a brazilian, well known, news portal called R7.com. It belongs to a broadcasting corporation called "Rede Record". Code:
import re class PortalR7(BasicNewsRecipe): title = 'Noticias R7' __author__ = 'Diniz Bortolotto' description = 'Noticias Portal R7' oldest_article = 2 max_articles_per_feed = 20 encoding = 'utf8' publisher = 'Rede Record' category = 'news, Brazil' language = 'pt_BR' publication_type = 'newsportal' use_embedded_content = False no_stylesheets = True remove_javascript = True remove_attributes = ['style'] feeds = [ (u'Brasil', u'http://www.r7.com/data/rss/brasil.xml'), (u'Economia', u'http://www.r7.com/data/rss/economia.xml'), (u'Internacional', u'http://www.r7.com/data/rss/internacional.xml'), (u'Tecnologia e Ci\xeancia', u'http://www.r7.com/data/rss/tecnologiaCiencia.xml') ] reverse_article_order = True keep_only_tags = [dict(name='div', attrs={'class':'materia'})] remove_tags = [ dict(id=['espalhe', 'report-erro']), dict(name='ul', attrs={'class':'controles'}), dict(name='ul', attrs={'class':'relacionados'}), dict(name='div', attrs={'class':'materia_banner'}), dict(name='div', attrs={'class':'materia_controles'}) ] preprocess_regexps = [ (re.compile(r'<div class="materia">.*<div class="materia_cabecalho">',re.DOTALL|re.IGNORECASE), lambda match: '<div class="materia"><div class="materia_cabecalho">') ] ![]() |
![]() |
![]() |
![]() |
#5 |
Connoisseur
![]() Posts: 58
Karma: 12
Join Date: May 2011
Location: Deland, Florida
Device: Kindle 3
|
It seems to work very well and downloaded in 2 minutes 11 seconds for me. It could have been the time of day, my computer network or one of a dozen other factors. But even 4 minutes seems a vast improvement. You may wish to publish your recipe under the forum for Recipes.
Forgive my lack of knowledge as to the language. It is all Greek to me!!! ![]() If I have been of any benefit, please do not forget to add to my Karma rating. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Member
![]() Posts: 15
Karma: 14
Join Date: Jun 2011
Location: Brazil
Device: Kindle
|
|
![]() |
![]() |
![]() |
#7 |
Connoisseur
![]() Posts: 58
Karma: 12
Join Date: May 2011
Location: Deland, Florida
Device: Kindle 3
|
At the bottom of every user's name and info about the user is a blue button "Karma". By clicking on it, you add to that user's rating, indicating his or her helpfulness within the community.
Best of luck!!! You'll do well as member of this forum since it seems you have experience in Python coding and know your way around a Recipe. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Fetching news. Timeout? | Sciamano | Recipes | 9 | 04-13-2011 06:30 AM |
Classic Taking Nook out for first time! | tiniree | Barnes & Noble NOOK | 6 | 10-30-2010 06:01 PM |
No Fetching | Lector77 | Calibre | 13 | 06-21-2010 07:19 PM |
Fetching The Australian | Javed | Calibre | 7 | 11-30-2009 04:15 AM |
Taking the plunge for a second time | Peverel | Lounge | 10 | 09-25-2009 12:12 AM |