Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 11-01-2008, 10:17 AM   #1
moosejons_dad
Zealot
moosejons_dad began at the beginning.
 
Posts: 100
Karma: 18
Join Date: Oct 2006
Location: N.J.
Device: Sony Readers PRS-500 exchanged by Sony for PRS-600, PRS-505,IPAD3,mini
Calibre-NY Times problem

Please take a look at the 11/1/08 edition of the NY Times. It downloads
ok on to the reader, and shows the titles for all of the newspapers articles.
However when you open these titles to read the articles there is blank..
Thanks for any assistance..
moosejons_dad is offline   Reply With Quote
Old 11-01-2008, 11:16 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
what is your output format? epub or lrf?
kovidgoyal is offline   Reply With Quote
Advert
Old 11-01-2008, 03:55 PM   #3
moosejons_dad
Zealot
moosejons_dad began at the beginning.
 
Posts: 100
Karma: 18
Join Date: Oct 2006
Location: N.J.
Device: Sony Readers PRS-500 exchanged by Sony for PRS-600, PRS-505,IPAD3,mini
I am using lrf..
moosejons_dad is offline   Reply With Quote
Old 11-01-2008, 04:44 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
sunday nytimes dont work with lrf try using epub
kovidgoyal is offline   Reply With Quote
Old 11-01-2008, 10:12 PM   #5
Acey
Member
Acey began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Oct 2008
Device: Sony PRS-505
I have the same problem with today's (Saturday) NY Times. A good portion of the articles show up with only an ad when viewed using the ebook viewer in Calibre. Some of the articles are fine, but most are not. These affected articles show up blank on the reader itself.
Acey is offline   Reply With Quote
Advert
Old 11-01-2008, 10:23 PM   #6
moosejons_dad
Zealot
moosejons_dad began at the beginning.
 
Posts: 100
Karma: 18
Join Date: Oct 2006
Location: N.J.
Device: Sony Readers PRS-500 exchanged by Sony for PRS-600, PRS-505,IPAD3,mini
I have a Sony prs-500 reader and it is my understanding that epub will not work on 500.
If that is true, am I going to be unable to read the NY Times anymore?
moosejons_dad is offline   Reply With Quote
Old 11-01-2008, 10:30 PM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I just had a look. Looks like the format of the website has changed. Will be fixed in the next release.
kovidgoyal is offline   Reply With Quote
Old 11-01-2008, 10:55 PM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Here's the fixed recipe, the pesky nytimes was trying hard to insert more ads into the readers experience

Code:
import string, re
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup

class NYTimes(BasicNewsRecipe):
    
    title       = 'The New York Times'
    __author__  = 'Kovid Goyal'
    description = 'Daily news from the New York Times'
    timefmt = ' [%a, %d %b, %Y]'
    needs_subscription = True
    
    remove_tags_before = dict(name='h1')
    remove_tags_after  = dict(id='footer')
    remove_tags = [dict(attrs={'class':['articleTools', 'post-tools', 'side_tool']}), 
                   dict(id=['footer', 'navigation', 'archive', 'side_search', 'blog_sidebar', 'side_tool', 'side_index']), 
                   dict(name=['script', 'noscript'])]
    encoding = 'cp1252'
    no_stylesheets = True
    extra_css = 'h1 {font: sans-serif large;}\n.byline {font:monospace;}'
    
    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        if self.username is not None and self.password is not None:
            br.open('http://www.nytimes.com/auth/login')
            br.select_form(name='login')
            br['USERID']   = self.username
            br['PASSWORD'] = self.password
            br.submit()
        return br
    
    def parse_index(self):
        soup = self.index_to_soup('http://www.nytimes.com/pages/todayspaper/index.html')
        
        def feed_title(div):
            return ''.join(div.findAll(text=True, recursive=False)).strip()
        
        articles = {}
        key = None
        ans = []
        for div in soup.findAll(True, 
            attrs={'class':['section-headline', 'story', 'story headline']}):
            
            if div['class'] == 'section-headline':
                key = string.capwords(feed_title(div))
                articles[key] = []
                ans.append(key)
            
            elif div['class'] in ['story', 'story headline']:
                a = div.find('a', href=True)
                if not a:
                    continue
                url = re.sub(r'\?.*', '', a['href'])
                url += '?pagewanted=print'
                title = self.tag_to_string(a, use_alt=True).strip()
                description = ''
                pubdate = strftime('%a, %d %b')
                summary = div.find(True, attrs={'class':'summary'})
                if summary:
                    description = self.tag_to_string(summary, use_alt=False)
                
                feed = key if key is not None else 'Uncategorized'
                if not articles.has_key(feed):
                    articles[feed] = []
                if not 'podcasts' in url:
                    articles[feed].append(
                                  dict(title=title, url=url, date=pubdate, 
                                       description=description,
                                       content=''))
        ans = self.sort_index_by(ans, {'The Front Page':-1, 'Dining In, Dining Out':1, 'Obituaries':2})
        ans = [(key, articles[key]) for key in ans if articles.has_key(key)]
        return ans
    
    def preprocess_html(self, soup):
        refresh = soup.find('meta', {'http-equiv':'refresh'})
        if refresh is None:
            return soup
        content = refresh.get('content').partition('=')[2]
        raw = self.browser.open('http://www.nytimes.com'+content).read()
        return BeautifulSoup(raw.decode('cp1252', 'replace'))
kovidgoyal is offline   Reply With Quote
Old 11-02-2008, 01:19 AM   #9
lovebeta
Groupie
lovebeta has a complete set of Star Wars action figures.lovebeta has a complete set of Star Wars action figures.lovebeta has a complete set of Star Wars action figures.lovebeta has a complete set of Star Wars action figures.lovebeta has a complete set of Star Wars action figures.
 
Posts: 176
Karma: 406
Join Date: Jan 2008
Device: Amazon Kindle 2, Amazon Kindle, Sony PRS-505
Kovid, is it possible to use the regular page instead of the print friendly version to grab the news articles? I understand that it was easier to parse the latter version, however the printer version doesn't have any of the nice news photos.

Normally pictures on PRS aren't necessarily a top priority. But believe or not, I now actually use Calibre to read NYTimes on my computer. It sounds crazy, but the advantage vs the web version is that I can linearly cruise through the day's story. It certainly beats a lot of mouse clicks. Plus absolutely no ads.
lovebeta is offline   Reply With Quote
Old 11-02-2008, 01:32 AM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Its certainly possible, but one would have to write a lot of junk html stripping code. I lack the desire to do that since I dont read the nytimes. But patches are welcome

And yeah, reading a calibre produces ebook beats reading on the web anyday, I do that on my tablet all the time, since I often have that and not my reader.
kovidgoyal is offline   Reply With Quote
Old 11-02-2008, 11:57 PM   #11
lovebeta
Groupie
lovebeta has a complete set of Star Wars action figures.lovebeta has a complete set of Star Wars action figures.lovebeta has a complete set of Star Wars action figures.lovebeta has a complete set of Star Wars action figures.lovebeta has a complete set of Star Wars action figures.
 
Posts: 176
Karma: 406
Join Date: Jan 2008
Device: Amazon Kindle 2, Amazon Kindle, Sony PRS-505
OK. I did a quick hack of Kovid's script. Disclaimer: I knew nothing about python. This is strictly a mimic/mod of his script. Also I found a bug along the way. Therefore, although this profile should theoretically work, I have to manually edit out the "imported css" in the htmls between the feeds2disk step and html2epub step. Otherwise html2epub kept report css selector error and consumed as much as 2GB memory before it hang up.

Code:
import string, re
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup

class NYTimes(BasicNewsRecipe):
    
    title       = 'NY Times'
    __author__  = 'Kovid Goyal'
    description = 'Daily news from the New York Times'
    timefmt = ' [%a, %d %b, %Y]'
    needs_subscription = True

    remove_tags_before = dict(id='article')
    remove_tags_after  = dict(id='article')
    remove_tags = [dict(attrs={'class':['articleTools', 'post-tools', 'side_tool', 'nextArticleLink clearfix']}), 
                   dict(id=['footer', 'toolsRight', 'articleInline', 'navigation', 'archive', 'side_search', 'blog_sidebar', 'side_tool', 'side_index']), 
                   dict(name=['script', 'noscript'])]
    encoding = 'cp1252'
    no_stylesheets = True
    extra_css = 'h1 {font: sans-serif large;}\n.byline {font:monospace;}'
    
    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        if self.username is not None and self.password is not None:
            br.open('http://www.nytimes.com/auth/login')
            br.select_form(name='login')
            br['USERID']   = self.username
            br['PASSWORD'] = self.password
            br.submit()
        return br
    
    def parse_index(self):
        soup = self.index_to_soup('http://www.nytimes.com/pages/todayspaper/index.html')
        
        def feed_title(div):
            return ''.join(div.findAll(text=True, recursive=False)).strip()
        
        articles = {}
        key = None
        ans = []
        for div in soup.findAll(True, 
            attrs={'class':['section-headline', 'story', 'story headline']}):
            
            if div['class'] == 'section-headline':
                key = string.capwords(feed_title(div))
                articles[key] = []
                ans.append(key)
            
            elif div['class'] in ['story', 'story headline']:
                a = div.find('a', href=True)
                if not a:
                    continue
                url = re.sub(r'\?.*', '', a['href'])
                url += '?pagewanted=all'
                title = self.tag_to_string(a, use_alt=True).strip()
                description = ''
                pubdate = strftime('%a, %d %b')
                summary = div.find(True, attrs={'class':'summary'})
                if summary:
                    description = self.tag_to_string(summary, use_alt=False)
                
                feed = key if key is not None else 'Uncategorized'
                if not articles.has_key(feed):
                    articles[feed] = []
                if not 'podcasts' in url:
                    articles[feed].append(
                                  dict(title=title, url=url, date=pubdate, 
                                       description=description,
                                       content=''))
        ans = self.sort_index_by(ans, {'The Front Page':-1, 'Dining In, Dining Out':1, 'Obituaries':2})
        ans = [(key, articles[key]) for key in ans if articles.has_key(key)]
        return ans
    
    def preprocess_html(self, soup):
        refresh = soup.find('meta', {'http-equiv':'refresh'})
        if refresh is None:
            return soup
        content = refresh.get('content').partition('=')[2]
        raw = self.browser.open('http://www.nytimes.com'+content).read()
        return BeautifulSoup(raw.decode('cp1252', 'replace'))
lovebeta is offline   Reply With Quote
Old 11-03-2008, 12:05 AM   #12
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Change
Code:
dict(name=['script', 'noscript']
to
dict(name=['script', 'noscript', 'style']
and it will work with html2epub as well
kovidgoyal is offline   Reply With Quote
Old 03-17-2009, 09:37 AM   #13
weatherman
Addict
weatherman ought to be getting tired of karma fortunes by now.weatherman ought to be getting tired of karma fortunes by now.weatherman ought to be getting tired of karma fortunes by now.weatherman ought to be getting tired of karma fortunes by now.weatherman ought to be getting tired of karma fortunes by now.weatherman ought to be getting tired of karma fortunes by now.weatherman ought to be getting tired of karma fortunes by now.weatherman ought to be getting tired of karma fortunes by now.weatherman ought to be getting tired of karma fortunes by now.weatherman ought to be getting tired of karma fortunes by now.weatherman ought to be getting tired of karma fortunes by now.
 
weatherman's Avatar
 
Posts: 385
Karma: 1010052
Join Date: Apr 2008
Device: (previous: Kindle 2, Kindle Fire) Kindle 4 WiFi, K3K, KPW
This is an old thread but I searched and can't seem to find a more recent one about a problem I've had since I upgraded to .5 - every time I download the NYT it outputs a .1mb file that doesn't have any content. That's for the non-subscription version. The subscription version outputs a 12mb+ file, and crashes my PRS-500 every time, so I can't use that. Did the recipe change for the NYT or is it just no longer available?
weatherman is offline   Reply With Quote
Old 03-17-2009, 09:51 AM   #14
moosejons_dad
Zealot
moosejons_dad began at the beginning.
 
Posts: 100
Karma: 18
Join Date: Oct 2006
Location: N.J.
Device: Sony Readers PRS-500 exchanged by Sony for PRS-600, PRS-505,IPAD3,mini
Quote:
Originally Posted by weatherman View Post
This is an old thread but I searched and can't seem to find a more recent one about a problem I've had since I upgraded to .5 - every time I download the NYT it outputs a .1mb file that doesn't have any content. That's for the non-subscription version. The subscription version outputs a 12mb+ file, and crashes my PRS-500 every time, so I can't use that. Did the recipe change for the NYT or is it just no longer available?
The subscription NYT works for me and I use Prs-500 for my reading and I have upgraded to .5 version also. The file is a little over 2 mb today...
I would reinstall the .50 version and then download the subscription NYT again and see it that fixes the problem.
moosejons_dad is offline   Reply With Quote
Old 03-17-2009, 12:12 PM   #15
weatherman
Addict
weatherman ought to be getting tired of karma fortunes by now.weatherman ought to be getting tired of karma fortunes by now.weatherman ought to be getting tired of karma fortunes by now.weatherman ought to be getting tired of karma fortunes by now.weatherman ought to be getting tired of karma fortunes by now.weatherman ought to be getting tired of karma fortunes by now.weatherman ought to be getting tired of karma fortunes by now.weatherman ought to be getting tired of karma fortunes by now.weatherman ought to be getting tired of karma fortunes by now.weatherman ought to be getting tired of karma fortunes by now.weatherman ought to be getting tired of karma fortunes by now.
 
weatherman's Avatar
 
Posts: 385
Karma: 1010052
Join Date: Apr 2008
Device: (previous: Kindle 2, Kindle Fire) Kindle 4 WiFi, K3K, KPW
Thanks. I'll do that when I get home and see if it works. I was getting so frustrated not being able to get the NY Times that I almost broke down and bought a Kindle.

I still might. But you may have removed my excuse.
weatherman is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
NY Times Recipe in Calibre 6.36 Fails keyrunner Calibre 1 01-28-2010 11:56 AM
Download times for Calibre updates brashley46 Calibre 9 03-23-2009 12:22 PM
Calibre 4.102-NY Times problem moosejons_dad Calibre 21 11-07-2008 09:05 PM
calibre - New York Times - Sony Library Problem Deputy-Dawg Calibre 5 06-21-2008 10:23 AM
NY Times problem radleyp Feedback 1 02-12-2003 02:04 PM


All times are GMT -4. The time now is 03:47 AM.


MobileRead.com is a privately owned, operated and funded community.