Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 01-15-2011, 03:50 PM   #1
fisab
Member
fisab began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Dec 2010
Device: Kindle
Sports Illustrated

The wonderful "Sports Illustrated" started failing for me 2 weeks ago.
It worked brilliantly before that - but now I only get 2 blank pages each time.
Its been the same for the last few versions of Calibre.
Is it working for anyone?

Many thanks in advance for any help.
fisab is offline   Reply With Quote
Old 01-15-2011, 07:54 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,359
Karma: 5382313
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
It's likely that the website changed for the new year so the recipe will have to be modified. While I do not have the time to fix other people's recipes, I had a quick look and committed a partial fix. You should get something if you try it now.

Last edited by kovidgoyal; 01-15-2011 at 08:00 PM.
kovidgoyal is offline   Reply With Quote
 
Advertisement
Old 01-16-2011, 01:58 PM   #3
fisab
Member
fisab began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Dec 2010
Device: Kindle
Thanks a million for that.
I donated a few weeks ago - great support - well worth a more regular donation.
fisab is offline   Reply With Quote
Old 02-18-2011, 08:41 PM   #4
BillD
Junior Member
BillD began at the beginning.
 
BillD's Avatar
 
Posts: 8
Karma: 10
Join Date: Sep 2010
Device: Kindle
SI fetch still not working for me with 0.7.46 ... any solution out there?

Thanks.
Attached Files
File Type: txt SI_fetch_err.txt (3.2 KB, 87 views)
BillD is offline   Reply With Quote
Old 02-19-2011, 08:54 PM   #5
spedinfargo
Zealot
spedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipedia
 
Posts: 120
Karma: 47540
Join Date: Nov 2010
Device: none
Quote:
Originally Posted by BillD View Post
SI fetch still not working for me with 0.7.46 ... any solution out there?

Thanks.
Looks like they took off the current issue and are moving to a paid subscription for a digital edition. When you click on the current issue (which the recipe relies on) it takes you to a new subscribe site.

Crappy deal - someone put in a LOT of work on this recpie...

All in the name of progress I guess...
spedinfargo is offline   Reply With Quote
Old 02-20-2011, 10:42 PM   #6
BillD
Junior Member
BillD began at the beginning.
 
BillD's Avatar
 
Posts: 8
Karma: 10
Join Date: Sep 2010
Device: Kindle
@sped ... thanks for info
BillD is offline   Reply With Quote
Old 03-10-2011, 10:46 AM   #7
jsl21
Member
jsl21 began at the beginning.
 
Posts: 17
Karma: 10
Join Date: May 2010
Device: Kindle
Workaround for Sports Illustrated

It turns out that the old infrastructure is still on the si.com website, it is just difficult to navigate there from the front page.

http://sportsillustrated.cnn.com/vau...1541/index.htm

If you alter the recipe to so that

currentIssue='http://sportsillustrated.cnn.com/vault/cover/toc/11541/index.htm'

you will get this issue. I just haven't be able to figure out how to fix the recipe to always get the latest issue. I assume next week I will just need to change the 11541 to 11542 manually.
jsl21 is offline   Reply With Quote
Old 03-10-2011, 11:22 AM   #8
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by jsl21 View Post
If you alter the recipe to so that
currentIssue='http://sportsillustrated.cnn.com/vault/cover/toc/11541/index.htm'
you will get this issue.
How did you find this link?
Starson17 is offline   Reply With Quote
Old 03-11-2011, 11:45 AM   #9
spedinfargo
Zealot
spedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipedia
 
Posts: 120
Karma: 47540
Join Date: Nov 2010
Device: none
Quote:
Originally Posted by jsl21 View Post
It turns out that the old infrastructure is still on the si.com website, it is just difficult to navigate there from the front page.

http://sportsillustrated.cnn.com/vau...1541/index.htm

If you alter the recipe to so that

currentIssue='http://sportsillustrated.cnn.com/vault/cover/toc/11541/index.htm'

you will get this issue. I just haven't be able to figure out how to fix the recipe to always get the latest issue. I assume next week I will just need to change the 11541 to 11542 manually.
Good find! I wonder if there's a way to add some logic to first go here to get the link:

http://sportsillustrated.cnn.com/vau...home/index.htm

It should always be the first link that looks like this:

<div id="ecomthumb_latest_11541"></div>

Is it possible to do a "two-step" process like this?
spedinfargo is offline   Reply With Quote
Old 03-11-2011, 12:24 PM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,359
Karma: 5382313
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Yes it is, you can have as many steps as you like in parse_index.
kovidgoyal is offline   Reply With Quote
Old 03-11-2011, 12:28 PM   #11
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by spedinfargo View Post
Good find! I wonder if there's a way to add some logic to first go here to get the link:

http://sportsillustrated.cnn.com/vau...home/index.htm

It should always be the first link that looks like this:

<div id="ecomthumb_latest_11541"></div>

Is it possible to do a "two-step" process like this?
Yes.
Do something like:
Code:
INDEX2 = 'http://sportsillustrated.cnn.com/vault/cover/home/index.htm'
followed by changing
Code:
soup = self.index_to_soup(self.INDEX)
to
Code:
soup = self.index_to_soup(self.INDEX2)
in parse_index
Than change
Code:
        cover = soup.find('div', attrs = {'alt' : 'Read All Articles', 'style' : 'vertical-align:bottom;'})
        if cover:
            currentIssue = cover.parent['href']
to whatever is needed to produce the currentIssue.

Last edited by Starson17; 03-11-2011 at 12:42 PM.
Starson17 is offline   Reply With Quote
Old 03-14-2011, 12:20 PM   #12
jsl21
Member
jsl21 began at the beginning.
 
Posts: 17
Karma: 10
Join Date: May 2010
Device: Kindle
I saw that index page that has all of the covers including the row that says Latest but can't figure out how to identify the most recent issue as it is identified by '11541' and will change every week.

Anyone know how to change the script to point to the first cover in the third row from that page assuming that will always be the location of the most recent issue?
jsl21 is offline   Reply With Quote
Old 03-14-2011, 12:33 PM   #13
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by jsl21 View Post
I saw that index page that has all of the covers including the row that says Latest but can't figure out how to identify the most recent issue as it is identified by '11541' and will change every week.

Anyone know how to change the script to point to the first cover in the third row from that page assuming that will always be the location of the most recent issue?
The tags are marked on that page with id or class. I don't have time to do it for you, but read about BeautifulSoup, study the page and the answer should be clear. use Find to find the first occurrence of a marked tag that has what you want. If that's beyond you, you'll have to wait for someone to do it for you.
Starson17 is offline   Reply With Quote
Old 03-16-2011, 03:45 PM   #14
spedinfargo
Zealot
spedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipedia
 
Posts: 120
Karma: 47540
Join Date: Nov 2010
Device: none
OK, I fixed the "getting the correct TOC page" issue. Interestingly enough, I was doing this right when SI was rolling out 6 different versions of the same issue for the NCAA tourney so it was kind of weird to test.

PROBLEM: The print_version is broken now. I think Clickability is doing some things to make it more difficult to pull down from their site. This might be what I've been seeing with other recipes as well. I'm going to start a new thread for that issue, but here's what I have so far.
spedinfargo is offline   Reply With Quote
Old 03-16-2011, 03:46 PM   #15
spedinfargo
Zealot
spedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipediaspedinfargo knows more than wikipedia
 
Posts: 120
Karma: 47540
Join Date: Nov 2010
Device: none
Updated for new logic for pulling current issue URL:

Code:
from calibre.web.feeds.recipes import BasicNewsRecipe
#from calibre.ebooks.BeautifulSoup import BeautifulSoup
from urllib import quote
import re

class SportsIllustratedRecipe(BasicNewsRecipe) :
    __author__  = 'kwetal'
    __copyright__ = 'kwetal'
    __license__ = 'GPL v3'
    language = 'en'
    description = 'Sports Illustrated'
    version = 3
    title          = u'Sports Illustrated'

    no_stylesheets = True
    remove_javascript = True
    use_embedded_content   = False

    INDEX = 'http://sportsillustrated.cnn.com/vault/cover/home/index.htm'

    def parse_index(self):
        answer = []
        soup = self.index_to_soup(self.INDEX)

        #Loop through all of the "latest" covers until we find one that actually has articles
        for item in soup.findAll('div', attrs={'id': re.compile("ecomthumb_latest_*")}):
            regex = re.compile('ecomthumb_latest_(\d*)')
            result = regex.search(str(item))
            current_issue_number = str(result.group(1))
            current_issue_link = 'http://sportsillustrated.cnn.com/vault/cover/toc/' + current_issue_number + '/index.htm'
            self.log('Checking this link for a TOC:  ', current_issue_link)

            index = self.index_to_soup(current_issue_link)
            if index:
                if index.find('div', 'siv_noArticleMessage'):
                    self.log('No TOC for this one.  Skipping...')
                else:
                    self.log('Found a TOC...  Using this link')
                    break

        # Find all articles.
        list = index.find('div', attrs = {'class' : 'siv_artList'})
        if list:
            self.log ('found siv_artList')
            articles = []
            # Get all the artcles ready for calibre.
            counter = 0
            for headline in list.findAll('div', attrs = {'class' : 'headline'}):
                counter = counter + 1
                title = self.tag_to_string(headline.a) + '\n' + self.tag_to_string(headline.findNextSibling('div', attrs = {'class' : 'info'}))
                url = self.INDEX + headline.a['href']
                description = self.tag_to_string(headline.findNextSibling('a').div)
                article = {'title' : title, 'date' : u'', 'url'  : url, 'description' : description}
                articles.append(article)
                if counter > 5:
                    break

            # See if we can find a meaningfull title
            feedTitle = 'Current Issue'
            hasTitle = index.find('div', attrs = {'class' : 'siv_imageText_head'})
            if hasTitle :
                feedTitle = self.tag_to_string(hasTitle.h1)

            answer.append([feedTitle, articles])

        return answer


    def print_version(self, url) :
        # This is the url and the parameters that work to get the print version.
        printUrl = 'http://si.printthis.clickability.com/pt/printThis?clickMap=printThis'
        printUrl += '&fb=Y&partnerID=2356&url=' + quote(url)
        self.log('PrintURL: ' , printUrl)

        return printUrl

        # However the original javascript also uses the following parameters, but they can be left out:
        #   title : can be some random string
        #   random : some random number, but I think the number of digits is important
        #   expire : no idea what value to use
        # All this comes from the Javascript function that redirects to the print version. It's called PT() and is defined in the file 48.js

    '''def preprocess_html(self, soup):
        header = soup.find('div', attrs = {'class' : 'siv_artheader'})
        homeMadeSoup = BeautifulSoup('<html><head></head><body></body></html>')
        body = homeMadeSoup.body

        # Find the date, title and byline
        temp = header.find('td', attrs = {'class' : 'title'})
        if temp :
            date = temp.find('div', attrs = {'class' : 'date'})
            if date:
                body.append(date)
            if temp.h1:
                body.append(temp.h1)
            if temp.h2 :
                body.append(temp.h2)
            byline = temp.find('div', attrs = {'class' : 'byline'})
            if byline:
                body.append(byline)

        # Find the content
        for para in soup.findAll('div', attrs = {'class' : 'siv_artpara'}) :
            body.append(para)

        return homeMadeSoup
        '''
spedinfargo is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
So, any of you into sports? Manichean Lounge 43 12-15-2010 08:51 AM
iPad NYT: Sports Illustrated Introduces iPad App kjk Apple Devices 1 06-25-2010 04:56 AM
Sports Illustrated Dazzling Tablet Device Daithi News 20 12-04-2009 09:24 PM
Sports Illustrated Feeds geneaber Calibre 18 11-30-2009 01:08 PM


All times are GMT -4. The time now is 02:18 AM.


MobileRead.com is a privately owned, operated and funded community.