Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 09-25-2010, 10:28 AM   #1
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Scientific American

The current Scientific American recipe is broken. The site changed. I've completely rewritten it:
Spoiler:
Code:
#!/usr/bin/env  python
__license__   = 'GPL v3'

import re
from calibre.web.feeds.news import BasicNewsRecipe

class ScientificAmerican(BasicNewsRecipe):
    title                 = u'Scientific American'
    description           = u'Popular Science. Monthly magazine.'
    category              = 'science'
    __author__            = 'Starson17'
    no_stylesheets        = True
    use_embedded_content  = False
    language              = 'en'
    publisher             = 'Nature Publishing Group'
    remove_empty_feeds    = True
    remove_javascript     = True
    oldest_article        = 30
    max_articles_per_feed = 100

    conversion_options = {'linearize_tables'  : True
                        , 'comment'           : description
                        , 'tags'              : category
                        , 'publisher'         : publisher
                        , 'language'          : language
                        }

    keep_only_tags = [
                dict(name='h2', attrs={'class':'articleTitle'})
                ,dict(name='p', attrs={'id':'articleDek'})
                ,dict(name='p', attrs={'class':'articleInfo'})
                ,dict(name='div', attrs={'id':['articleContent']})
                ,dict(name='img', attrs={'src':re.compile(r'/media/inline/blog/Image/', re.DOTALL|re.IGNORECASE)}) 
                ]

    remove_tags = [dict(name='a', attrs={'class':'tinyCommentCount'})]

    def parse_index(self):
        soup = self.index_to_soup('http://www.scientificamerican.com/sciammag/')
        issuetag = soup.find('p',attrs={'id':'articleDek'})
        self.timefmt = ' [%s]'%(self.tag_to_string(issuetag))
        img = soup.find('img', alt='Scientific American Magazine', src=True)
        if img is not None:
            self.cover_url = img['src']
        features, feeds = [], []
        for a in soup.find(attrs={'class':'primaryCol'}).findAll('a',attrs={'title':'Feature'}):
            if a is None: continue
            desc = ''
            s = a.parent.parent.find(attrs={'class':'dek'})
            desc = self.tag_to_string(s)
            article = {
                    'url' : a['href'],
                    'title' : self.tag_to_string(a),
                    'date' : '',
                    'description' : desc,
                    }
            features.append(article)
        feeds.append(('Features', features))
        department = []
        title = None
        for li in soup.find(attrs={'class':'secondaryCol'}).findAll('li'):
            if 'department.cfm' in li.a['href']:
                if department:
                    feeds.append((title, department))
                title = self.tag_to_string(li.a)
                department = []
            if 'article.cfm' in li.h3.a['href']:
                article = {
                        'url' : li.h3.a['href'],
                        'title' : self.tag_to_string(li.h3.a),
                        'date': '',
                        'description': self.tag_to_string(li.p),
                    }
                department.append(article)
        if department:
            feeds.append((title, department))
        return feeds

    def postprocess_html(self, soup, first_fetch):
        for item in soup.findAll('a'):
            if 'topic.cfm' in item['href']:
                item.replaceWith(item.string)
        return soup

    extra_css = '''
                p{font-weight: normal; font-size:small}
                li{font-weight: normal; font-size:small}
                .headline p{font-size:x-small; font-family:Arial,Helvetica,sans-serif;}
                h2{font-size:large; font-family:Arial,Helvetica,sans-serif;}
                h3{font-size:x-small;font-family:Arial,Helvetica,sans-serif;}
                '''

Last edited by Starson17; 09-25-2010 at 11:13 AM.
Starson17 is offline   Reply With Quote
Old 09-25-2010, 02:18 PM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Tony (or anyone else)

If you get a chance, would you check that this recipe is working in 0.7.20? Check the built-in, not the one posted here, and check it inside Calibre, not your test environment. I was seeing some odd behavior. There should be more than 3 feeds, and the name of the epub should have "Issue" inside square brackets, as in "[October 2010 Issue]" and not just "[October 2010]".

Thanks!
Starson17 is offline   Reply With Quote
Old 09-25-2010, 02:39 PM   #3
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by Starson17 View Post
Tony (or anyone else)

If you get a chance, would you check that this recipe is working in 0.7.20? Check the built-in, not the one posted here, and check it inside Calibre, not your test environment. I was seeing some odd behavior. There should be more than 3 feeds, and the name of the epub should have "Issue" inside square brackets, as in "[October 2010 Issue]" and not just "[October 2010]".

Thanks!
Checking it now. will post results
TonytheBookworm is offline   Reply With Quote
Old 09-25-2010, 02:51 PM   #4
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
I only see three feeds
  1. Departments
  2. Features
  3. Online Exclusives

I also only see October 2010 in braces

I also a bunch of "extra junk"
  • Subscribe to Digital
  • Subscribe to Print
  • and so on....
TonytheBookworm is offline   Reply With Quote
Old 09-25-2010, 02:58 PM   #5
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by TonytheBookworm View Post
I only see three feeds
  1. Departments
  2. Features
  3. Online Exclusives

I also only see October 2010 in braces

I also a bunch of "extra junk"
  • Subscribe to Digital
  • Subscribe to Print
  • and so on....
Would you verify it has this line in it:
Code:
issuetag = soup.find('p',attrs={'id':'articleDek'})
and then test it as a custom recipe inside Calibre, and if that fails, too, would you test it in your test environment, via command line? It may have something to do with your problem of junk that appears when --test is not used. That recipe was working great when I posted it. Something is odd, and I'd like to track it, but I can't seem to reproduce it here.
Starson17 is offline   Reply With Quote
Old 09-25-2010, 03:07 PM   #6
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by Starson17 View Post
Would you verify it has this line in it:
Code:
issuetag = soup.find('p',attrs={'id':'articleDek'})
and then test it as a custom recipe inside Calibre, and if that fails, too, would you test it in your test environment, via command line? It may have something to do with your problem of junk that appears when --test is not used. That recipe was working great when I posted it. Something is odd, and I'd like to track it, but I can't seem to reproduce it here.
now your seeing the weirdness i have been seeing. but yeah I will test it now... I thought I was going crazy but I know what i been seeing haha
TonytheBookworm is offline   Reply With Quote
Old 09-25-2010, 03:10 PM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,131
Karma: 5381911
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Just to note: When you run a builtin recipe incalibre, the latest version of the recipe is fetched from the calibre server. When you run a recipe from the command line by providing the path to a .recipe file, that file is used.

Your problems may be caused by the mismatch between the two recipes.
kovidgoyal is offline   Reply With Quote
Old 09-25-2010, 03:12 PM   #8
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
the built in recipe does not have imagetag in it. do you want me to put it and then run the test?
TonytheBookworm is offline   Reply With Quote
Old 09-25-2010, 03:16 PM   #9
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by TonytheBookworm View Post
now your seeing the weirdness i have been seeing. but yeah I will test it now... I thought I was going crazy but I know what i been seeing haha
I believe you, because I saw your problem, too. This is similar, but not the same, as it's not dependent on --test. It's almost like another recipe is running. I'm getting 13 feeds, not 3, They start with "Features," and there is no "Departments" feed.
Starson17 is offline   Reply With Quote
Old 09-25-2010, 03:19 PM   #10
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by Starson17 View Post
I believe you, because I saw your problem, too. This is similar, but not the same, as it's not dependent on --test. It's almost like another recipe is running. I'm getting 13 feeds, not 3, They start with "Features," and there is no "Departments" feed.
I have a feeling or recipes are different like Kovid mentioned. Here is the built in recipe I have.
Spoiler:

Code:
#!/usr/bin/env  python
__license__   = 'GPL v3'
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
__docformat__ = 'restructuredtext en'

'''
sciam.com
'''
import re
from calibre.web.feeds.news import BasicNewsRecipe

class ScientificAmerican(BasicNewsRecipe):
    title = u'Scientific American'
    description = u'Popular science. Monthly magazine.'
    __author__ = 'Kovid Goyal'
    language = 'en'
    remove_javascript   = True
    encoding = 'utf-8'

    def print_version(self, url):
        return url + '&print=true'

    def parse_index(self):
        soup = self.index_to_soup('http://www.scientificamerican.com/sciammag/')
        month = self.tag_to_string(soup.find('p',attrs={'id':'articleDek'}))
        self.timefmt = ' [%s]'%(' '.join(month.strip().split()[:2]))
        img = soup.find('img', alt='Scientific American Magazine', src=True)
        if img is not None:
            self.cover_url = img['src']

        feeds = []
        for div in soup.findAll('div', attrs={'class':['primaryCol',
            'secondaryCol']}):
            current_section = None
            for tag in div.findAll(['h2', 'ul']):
                if tag.name == 'h2':
                    current_section = self.tag_to_string(tag).strip()
                    self.log('\tFound section:', current_section)
                elif current_section is not None and tag.name == 'ul':
                    articles = []
                    for li in tag.findAll('li'):
                        t = li.findAll('a',
                                attrs={'class':lambda x: x != 'thumb'},
                                href=lambda x: x and 'article.cfm' in x)
                        if not t:
                            continue
                        t = t[-1]
                        title = self.tag_to_string(t)
                        url = t['href']
                        desc = ''
                        p = li.find(attrs={'class':'dek'})
                        if p is not None:
                            desc = self.tag_to_string(p)
                        articles.append({'title':title, 'url':url,
                            'description':desc, 'date':''})
                        self.log('\t\tFound article:', title, '\n\t\tat', url)
                    if articles:
                        feeds.append((current_section, articles))
                    current_section = None
        return feeds

    def postprocess_html(self, soup, first_fetch):
        if soup is not None:
            for span in soup.findAll('span', attrs={'class':'pagination'}):
                span.extract()
            if not first_fetch:
                div = soup.find('div', attrs={'class':'headline'})
                if div:
                    div.extract()

        return soup

    preprocess_regexps = [
        (re.compile(r'Already a Digital subscriber.*Now</a>', re.DOTALL|re.IGNORECASE), lambda match: ''),
        (re.compile(r'If your institution has site license access, enter.*here</a>.', re.DOTALL|re.IGNORECASE), lambda match: ''),
        (re.compile(r'to subscribe to our.*;.*\}', re.DOTALL|re.IGNORECASE), lambda match: ''),
        (re.compile(r'\)\(jQuery\);.*-->', re.DOTALL|re.IGNORECASE), lambda match: ''),
        ]
TonytheBookworm is offline   Reply With Quote
Old 09-25-2010, 03:22 PM   #11
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kovidgoyal View Post
Just to note: When you run a builtin recipe incalibre, the latest version of the recipe is fetched from the calibre server. When you run a recipe from the command line by providing the path to a .recipe file, that file is used.

Your problems may be caused by the mismatch between the two recipes.
Aha. I didn't realize that. I knew statistics kept track of recipe usage, but I didn't realize it pulled the whole current recipe. I think that explains it. What recipe do I get if I ask to modify the builtin? Does it fetch first, or give me the local copy?
Starson17 is offline   Reply With Quote
Old 09-25-2010, 03:24 PM   #12
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,131
Karma: 5381911
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by Starson17 View Post
Aha. I didn't realize that. I knew statistics kept track of recipe usage, but I didn't realize it pulled the whole current recipe. I think that explains it. What recipe do I get if I ask to modify the builtin? Does it fetch first, or give me the local copy?
Don't recall, I think it fetches first.
kovidgoyal is offline   Reply With Quote
Old 09-25-2010, 03:26 PM   #13
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by TonytheBookworm View Post
I have a feeling or recipes are different like Kovid mentioned. Here is the built in recipe I have.
OK, that's it. That's not my recipe. I was confused between the automatic pull of the recipe, my updated codebase and some other changes. It appears it has nothing to do with your problem, AFAICT.

Thanks for the help.
Starson17 is offline   Reply With Quote
Old 09-25-2010, 03:37 PM   #14
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kovidgoyal View Post
Don't recall, I think it fetches first.
OK, here's what's happening. I'm running Calibre 0.7.20 from updated code. If I try to edit the builtin, I see my recipe (Kovid has updated the codebase with my submitted recipe). If I browse to the recipe in resources, I also see my recipe. But, if I try to just run it normally, as a builtin recipe, I get another version (with code that I can't see). I'm sure that wherever it's pulling the recipe from will eventually update to what's in the code base and that will fix this.

Thanks for the help, Tony, and for the answers, Kovid.
Starson17 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Scientific American recipe broken (?) jamesewood Calibre 4 09-23-2010 03:37 PM
Science and Politics - from Scientific American kennyc Lounge 0 05-04-2010 07:02 AM
PRS-600 Cannot load scientific american pdf file mjsalman Sony Reader 11 02-06-2010 12:50 PM
Scientific American recipe Stingo Calibre 2 10-30-2009 05:42 PM
Scientific American E-Ink Article wallcraft News 1 05-28-2008 11:59 AM


All times are GMT -4. The time now is 05:26 PM.


MobileRead.com is a privately owned, operated and funded community.