Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 06-12-2008, 08:12 PM   #16
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
This is based on published WSJ profile.
I had pm'ed you my login name and password, feel free to use it for testing/reading.


PHP Code:
##    Copyright (C) 2008 Kovid Goyal kovid@kovidgoyal.net
##    This program is free software; you can redistribute it and/or modify
##    it under the terms of the GNU General Public License as published by
##    the Free Software Foundation; either version 2 of the License, or
##    (at your option) any later version.
##
##    This program is distributed in the hope that it will be useful,
##    but WITHOUT ANY WARRANTY; without even the implied warranty of
##    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
##    GNU General Public License for more details.
##
##    You should have received a copy of the GNU General Public License along
##    with this program; if not, write to the Free Software Foundation, Inc.,
##    51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
 

import time
import re
## from libprs500.ebooks.lrf.web.profiles import DefaultProfile
## from libprs500.ebooks.BeautifulSoup import BeautifulSoup
from calibre.web.feeds.news import BasicNewsRecipe
from calibre
.ebooks.lrf.web.profiles import DefaultProfile
from calibre
.ebooks.BeautifulSoup import BeautifulSoup

class WallStreetJournalPaper(BasicNewsRecipe): 
    
import time
    import re
    from calibre
.web.feeds.news import BasicNewsRecipe
    from calibre
.ebooks.lrf.web.profiles import DefaultProfile
    from calibre
.ebooks.BeautifulSoup import BeautifulSoup
    
    title 
'Wall Street Print Edition' 
    
__author__ 'Kovid Goyal'
    
simultaneous_downloads 1    
    max_articles_per_feed 
200
    INDEX 
'http://online.wsj.com/page/2_0133.html'
    
timefmt  ' [%a, %b %d, %Y]' 
    
no_stylesheets False
    html2lrf_options 
= [('--ignore-tables')]
    
issue_date time.ctime()
    print 
issue_date




    
## Don't grab articles more than 7 days old 
    
oldest_article 7

    def get_browser
(self): 
        
br DefaultProfile.get_browser() 
        if 
self.username is not None and self.password is not None
            
br.open('http://online.wsj.com/login'
            
br.select_form(name='login_form'
            
br['user']   = self.username 
            br
['password'] = self.password 
            br
.submit() 
        return 
br 
   
    preprocess_regexps 
= [(re.compile(i[0], re.IGNORECASE re.DOTALL), i[1]) for i in  
        

        
## Remove anything before the body of the article. 
        
(r'<body.*?<!-- article start'lambda match'<body><!-- article start'), 
 
        
## Remove any insets from the body of the article. 
        
(r'<div id="inset".*?</div>.?</div>.?<p'lambda match '<p'), 
 
        
## Remove anything after the end of the article. 
        
(r'<!-- article end.*?</body>'lambda match '</body>'), 
        ] 
    ] 
 
 
     
    
def parse_index(self):
        
articles = []
            
soup self.index_to_soup(self.INDEX)
        
issue_date time.ctime()
        
        for 
item in soup.findAll('a'attrs={'class':'bold80'}):
            
item.find('a')
            if 
and a.has_key('href'):
                
url item['href']
                
url 'http://online.wsj.com'+url.replace('/article''/article_print')
                
title self.tag_to_string(item)
                
description ''
                
articles.append({
                    
'title':title,
                    
'date':date,
                    
'url':url,
                    
'description':description
                    
})
               
    
        return {
'Todays Paper' articles 
ddavtian is offline   Reply With Quote
Old 06-12-2008, 08:23 PM   #17
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,749
Karma: 22446736
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Code:
return [('Todays newspaper', articles)]
Incindentally, how is the WSJ doing post murdoch?
kovidgoyal is offline   Reply With Quote
Old 06-12-2008, 10:26 PM   #18
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
I started reading it this year (being able to read on Sony was a big factor for me), so I cannot compare before-after.
ddavtian is offline   Reply With Quote
Old 07-04-2008, 08:52 PM   #19
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
Quote:
Originally Posted by kovidgoyal View Post
post your recipe
Hi Kovid. Did you have a chance to look at this posted recipe? I understand if you do not have time to look at individual recipes.

Thanks for great software,
David
ddavtian is offline   Reply With Quote
Old 07-05-2008, 12:39 PM   #20
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,749
Karma: 22446736
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Your return statement should be:

Code:
return [('Today\'s Paper', articles)]
kovidgoyal is offline   Reply With Quote
Old 07-05-2008, 11:33 PM   #21
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
Quote:
Originally Posted by kovidgoyal View Post
Your return statement should be:
Code:
return [('Today\'s Paper', articles)]
You had said this 3 weeks ago and I didn't get it then :-(

I tried it and got a new error:
Traceback (most recent call last):
File "convert_from.py", line 61, in <module>
File "convert_from.py", line 42, in main
File "calibre\web\feeds\main.pyo", line 128, in run_recipe
File "calibre\web\feeds\news.pyo", line 825, in __init__
File "calibre\ebooks\lrf\web\profiles\__init__.pyo" , line 174, in __init__
File "calibre\ebooks\lrf\web\profiles\__init__.pyo" , line 204, in build_index
AttributeError: 'list' object has no attribute 'keys'

I put few print statements to track the flow, it never gets into this loop:
for item in soup.findAll('a', attrs={'class':'bold80'}):


I checked the web page, nothing was changed there. Articles are identifed correctly. Here is a link from the source code:
<a class="bold80" href="/article/SB121521047990229423.html?mod=todays_us_page_one">

Kovid, your help is very much appreciated.
Thanks in advance.
ddavtian is offline   Reply With Quote
Old 07-06-2008, 12:21 AM   #22
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,749
Karma: 22446736
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Use the command feeds2lrf not web2lrf
kovidgoyal is offline   Reply With Quote
Old 07-06-2008, 01:15 AM   #23
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
Error is from feeds2lrf (I have 0.4.76 calibre):

C:\Temp\News>feeds2lrf --debug wsjNew.py --username=xxx --password=xxx
Fetching feeds...
Sat Jul 05 22:12:09 2008
Traceback (most recent call last):
File "convert_from.py", line 61, in <module>
File "convert_from.py", line 42, in main
File "calibre\web\feeds\main.pyo", line 128, in run_recipe
File "calibre\web\feeds\news.pyo", line 825, in __init__
File "calibre\ebooks\lrf\web\profiles\__init__.pyo" , line 174, in __init__
File "calibre\ebooks\lrf\web\profiles\__init__.pyo" , line 204, in build_index
AttributeError: 'list' object has no attribute 'keys'
ddavtian is offline   Reply With Quote
Old 07-06-2008, 11:36 AM   #24
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,749
Karma: 22446736
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Delete the line
Code:
from calibre.ebooks.lrf.web.profiles import DefaultProfile 

kovidgoyal is offline   Reply With Quote
Old 07-06-2008, 07:19 PM   #25
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
The same error:

Sun Jul 06 16:14:26 2008
Traceback (most recent call last):
File "convert_from.py", line 61, in <module>
File "convert_from.py", line 42, in main
File "calibre\web\feeds\main.pyo", line 128, in run_recipe
File "calibre\web\feeds\news.pyo", line 825, in __init__
File "calibre\ebooks\lrf\web\profiles\__init__.pyo" , line 174, in __init__
File "calibre\ebooks\lrf\web\profiles\__init__.pyo" , line 204, in build_index
AttributeError: 'list' object has no attribute 'keys'
ddavtian is offline   Reply With Quote
Old 07-07-2008, 02:08 PM   #26
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,749
Karma: 22446736
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The attached recipe works for me with the command line
Code:
feeds2lrf test.py
Recipe:
Code:
##    Copyright (C) 2008 Kovid Goyal kovid@kovidgoyal.net
##    This program is free software; you can redistribute it and/or modify
##    it under the terms of the GNU General Public License as published by
##    the Free Software Foundation; either version 2 of the License, or
##    (at your option) any later version.
##
##    This program is distributed in the hope that it will be useful,
##    but WITHOUT ANY WARRANTY; without even the implied warranty of
##    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
##    GNU General Public License for more details.
##
##    You should have received a copy of the GNU General Public License along
##    with this program; if not, write to the Free Software Foundation, Inc.,
##    51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
 

import time
import re
## from libprs500.ebooks.lrf.web.profiles import DefaultProfile
## from libprs500.ebooks.BeautifulSoup import BeautifulSoup
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup

class WallStreetJournalPaper(BasicNewsRecipe): 
    import time
    import re
    from calibre.web.feeds.news import BasicNewsRecipe
    from calibre.ebooks.lrf.web.profiles import DefaultProfile
    from calibre.ebooks.BeautifulSoup import BeautifulSoup
    
    title = 'Wall Street Print Edition' 
    __author__ = 'Kovid Goyal'
    simultaneous_downloads = 1    
    max_articles_per_feed = 200
    INDEX = 'http://online.wsj.com/page/2_0133.html'
    timefmt  = ' [%a, %b %d, %Y]' 
    no_stylesheets = False
    html2lrf_options = [('--ignore-tables')]
    issue_date = time.ctime()
    print issue_date




    ## Don't grab articles more than 7 days old 
    oldest_article = 7

    def get_browser(self): 
        br = DefaultProfile.get_browser() 
        if self.username is not None and self.password is not None: 
            br.open('http://online.wsj.com/login') 
            br.select_form(name='login_form') 
            br['user']   = self.username 
            br['password'] = self.password 
            br.submit() 
        return br 
   
    preprocess_regexps = [(re.compile(i[0], re.IGNORECASE | re.DOTALL), i[1]) for i in  
        [ 
        ## Remove anything before the body of the article. 
        (r'<body.*?<!-- article start', lambda match: '<body><!-- article start'), 
 
        ## Remove any insets from the body of the article. 
        (r'<div id="inset".*?</div>.?</div>.?<p', lambda match : '<p'), 
 
        ## Remove anything after the end of the article. 
        (r'<!-- article end.*?</body>', lambda match : '</body>'), 
        ] 
    ] 
 
 
     
    def parse_index(self):
        articles = []
        soup = self.index_to_soup(self.INDEX)
        issue_date = time.ctime()
        
        for item in soup.findAll('a', attrs={'class':'bold80'}):
            a = item.find('a')
            if a and a.has_key('href'):
                url = item['href']
                url = 'http://online.wsj.com'+url.replace('/article', '/article_print')
                title = self.tag_to_string(item)
                description = ''
                articles.append({
                    'title':title,
                    'date':date,
                    'url':url,
                    'description':description
                    })
               
    
        return [('Todays Paper', articles)]
kovidgoyal is offline   Reply With Quote
Old 07-08-2008, 02:17 AM   #27
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
Thank you Kovid!

Your recipe went fine from command line. Output was an empty file, I think it's related to my login to the page. They block access if few logins were done from different computers. I'll try again tomorrow.
ddavtian is offline   Reply With Quote
Old 07-09-2008, 10:21 AM   #28
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
No luck with WSJ so far.

When I use the posted recipe, I get an empty file. It does find articles (a = item.find('a')), but doesn't pass this condition: "if a and a.has_key('href'):".

When I remove this condition, it gets articles (I print titles and see all of them from the web page), but fails at the end:

Traceback (most recent call last):
File "convert_from.py", line 61, in <module>
File "convert_from.py", line 42, in main
File "calibre\web\feeds\main.pyo", line 134, in run_recipe
File "calibre\web\feeds\news.pyo", line 472, in download
File "calibre\web\feeds\news.pyo", line 578, in build_index
File "c:\docume~1\davidd~1\locals~1\temp\calibre_0.4.76 _j-dnk5_recipes\recipe0
.py", line 89, in parse_index
print title
File "encodings\cp437.pyo", line 12, in encode
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2026' in position
5: character maps to <undefined>
ddavtian is offline   Reply With Quote
Old 07-09-2008, 11:06 AM   #29
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,749
Karma: 22446736
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Can you send me your WSJ username and password again. I need it to debug further.
kovidgoyal is offline   Reply With Quote
Old 07-09-2008, 12:23 PM   #30
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
Quote:
Originally Posted by kovidgoyal View Post
Can you send me your WSJ username and password again. I need it to debug further.
Sent.

I logged out from the page, you should be able to login. If I try calibre recipe few times in a row, they lock the account. Then it takes 5-6 hours to get access again. Painful to test changes.


Thanks in advance.
ddavtian is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with calibre recipes CaptainJSK Calibre 1 07-11-2010 01:12 AM
Calibre Recipes and iPad/iBooks jbambridge Calibre 8 05-16-2010 04:30 PM
Classification of Recipes in Calibre wayner Calibre 3 11-27-2009 09:48 AM
Problem with my recipes (Calibre 0.6.2) MikeBoud Calibre 18 08-05-2009 10:20 PM


All times are GMT -4. The time now is 07:55 AM.


MobileRead.com is a privately owned, operated and funded community.