Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 06-14-2011, 09:55 PM   #1
Bortolotto
Member
Bortolotto began at the beginning.
 
Bortolotto's Avatar
 
Posts: 15
Karma: 14
Join Date: Jun 2011
Location: Brazil
Device: Kindle
Fetching taking too much time

Hi buddies!

I've made a recipe that is taking too much time to process (about 30 min.)
May you take a look at it?

Below is the recipe and attached the execution log.

Code:
class PortalR7(BasicNewsRecipe):
    title                  = 'Noticias R7'
    __author__             = 'Diniz Bortolotto'
    description            = 'Noticias Portal R7'
    oldest_article         = 2
    max_articles_per_feed  = 20
    encoding               = 'utf8'
    publisher              = 'Rede Record'
    category               = 'news, Brazil'
    language               = 'pt_BR'
    publication_type       = 'newsportal'
    feeds                  = [
                              (u'Brasil', u'http://www.r7.com/data/rss/brasil.xml'), 
                              (u'Economia', u'http://www.r7.com/data/rss/economia.xml'), 
                              (u'Internacional', u'http://www.r7.com/data/rss/internacional.xml'), 
                              (u'Tecnologia e Ci\xeancia', u'http://www.r7.com/data/rss/tecnologiaCiencia.xml')
                             ]

    reverse_article_order  = True
    remove_tags            = [
                              dict(name='ul', attrs={'class':'controles'}),
                              dict(name='div', attrs={'class':'materia_banner'}),
                              dict(name='ul', attrs={'class':'relacionados'})
                             ]
    keep_only_tags         = [
                              dict(name='div', attrs={'class':'materia'})
                             ]
Attached Files
File Type: txt calibre_fetch.txt (495.1 KB, 182 views)
Bortolotto is offline   Reply With Quote
Old 06-15-2011, 06:49 PM   #2
BRGriff
Connoisseur
BRGriff began at the beginning.
 
Posts: 58
Karma: 12
Join Date: May 2011
Location: Deland, Florida
Device: Kindle 3
Quote:
I've made a recipe that is taking too much time to process (about 30 min.)
May you take a look at it?
I am addressing ONLY your complaint of the time it takes to download your articles! Since I do not read Spanish, I can not even tell if my assistance is of any help. I have done some minor work on your recipe and the material downloaded in 3 minutes (being converted into .mobi format for a Kindle). There are MANY tags and other extraneous material that still should be removed, but I will leave that to you as, again, I am only addressing the time issue.

See if the attached recipe is of any benefit and let me know. If you have further questions about your recipe, other than time it takes to download, submit another post regarding those issues.
Attached Files
File Type: rtf Noticias R7.rtf (1.9 KB, 165 views)
BRGriff is offline   Reply With Quote
Advert
Old 06-15-2011, 07:59 PM   #3
BRGriff
Connoisseur
BRGriff began at the beginning.
 
Posts: 58
Karma: 12
Join Date: May 2011
Location: Deland, Florida
Device: Kindle 3
Updated Recipe

I decided to work a little further and assist in removing some of the extraneous tags. There are still many more but, as I said, I do not read Spanish so I can not tell what is relevant and what is not.
Attached Files
File Type: rtf Noticias R7 Rev.1.rtf (1.9 KB, 142 views)
BRGriff is offline   Reply With Quote
Old 06-15-2011, 10:49 PM   #4
Bortolotto
Member
Bortolotto began at the beginning.
 
Bortolotto's Avatar
 
Posts: 15
Karma: 14
Join Date: Jun 2011
Location: Brazil
Device: Kindle
Thank you! And my recipe version.

Hi BRGriff!!

First of all, I want to say a big "Thank you!!".

Considering your first reply, I made a new recipe (below).

Now, that new version takes around 4 minutes to fetch and create MOBI output. That is really better than first version.

So, I believe this new recipe can be usefull for all friends that are able to read in Brazilian Portuguese (not Spanish ).

The RSS source is a brazilian, well known, news portal called R7.com.
It belongs to a broadcasting corporation called "Rede Record".


Code:
import re

class PortalR7(BasicNewsRecipe):
    title                  = 'Noticias R7'
    __author__             = 'Diniz Bortolotto'
    description            = 'Noticias Portal R7'
    oldest_article         = 2
    max_articles_per_feed  = 20
    encoding               = 'utf8'
    publisher              = 'Rede Record'
    category               = 'news, Brazil'
    language               = 'pt_BR'
    publication_type       = 'newsportal'
    use_embedded_content   = False
    no_stylesheets         = True
    remove_javascript      = True
    remove_attributes      = ['style']

    feeds                  = [
                              (u'Brasil', u'http://www.r7.com/data/rss/brasil.xml'), 
                              (u'Economia', u'http://www.r7.com/data/rss/economia.xml'), 
                              (u'Internacional', u'http://www.r7.com/data/rss/internacional.xml'), 
                              (u'Tecnologia e Ci\xeancia', u'http://www.r7.com/data/rss/tecnologiaCiencia.xml')
                             ]
    reverse_article_order  = True

    keep_only_tags         = [dict(name='div', attrs={'class':'materia'})]
    remove_tags            = [
                              dict(id=['espalhe', 'report-erro']),
                              dict(name='ul', attrs={'class':'controles'}),
                              dict(name='ul', attrs={'class':'relacionados'}),
                              dict(name='div', attrs={'class':'materia_banner'}),
                              dict(name='div', attrs={'class':'materia_controles'})
                             ]

    preprocess_regexps     = [
                              (re.compile(r'<div class="materia">.*<div class="materia_cabecalho">',re.DOTALL|re.IGNORECASE),
                              lambda match: '<div class="materia"><div class="materia_cabecalho">')
                             ]
What do you think about my new recipe?
Bortolotto is offline   Reply With Quote
Old 06-16-2011, 10:48 AM   #5
BRGriff
Connoisseur
BRGriff began at the beginning.
 
Posts: 58
Karma: 12
Join Date: May 2011
Location: Deland, Florida
Device: Kindle 3
It seems to work very well and downloaded in 2 minutes 11 seconds for me. It could have been the time of day, my computer network or one of a dozen other factors. But even 4 minutes seems a vast improvement. You may wish to publish your recipe under the forum for Recipes.

Forgive my lack of knowledge as to the language. It is all Greek to me!!!

If I have been of any benefit, please do not forget to add to my Karma rating.
BRGriff is offline   Reply With Quote
Advert
Old 06-16-2011, 11:26 AM   #6
Bortolotto
Member
Bortolotto began at the beginning.
 
Bortolotto's Avatar
 
Posts: 15
Karma: 14
Join Date: Jun 2011
Location: Brazil
Device: Kindle
Quote:
Originally Posted by BRGriff View Post
If I have been of any benefit, please do not forget to add to my Karma rating.
Yes, you helped me a lot! But, what is a "Karma rating"?
Bortolotto is offline   Reply With Quote
Old 06-16-2011, 12:01 PM   #7
BRGriff
Connoisseur
BRGriff began at the beginning.
 
Posts: 58
Karma: 12
Join Date: May 2011
Location: Deland, Florida
Device: Kindle 3
At the bottom of every user's name and info about the user is a blue button "Karma". By clicking on it, you add to that user's rating, indicating his or her helpfulness within the community.

Best of luck!!! You'll do well as member of this forum since it seems you have experience in Python coding and know your way around a Recipe.
BRGriff is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Fetching news. Timeout? Sciamano Recipes 9 04-13-2011 06:30 AM
Classic Taking Nook out for first time! tiniree Barnes & Noble NOOK 6 10-30-2010 06:01 PM
No Fetching Lector77 Calibre 13 06-21-2010 07:19 PM
Fetching The Australian Javed Calibre 7 11-30-2009 04:15 AM
Taking the plunge for a second time Peverel Lounge 10 09-25-2009 12:12 AM


All times are GMT -4. The time now is 05:01 AM.


MobileRead.com is a privately owned, operated and funded community.