Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Other formats > LRF

Notices

Reply
 
Thread Tools Search this Thread
Old 12-03-2007, 08:27 PM   #106
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,778
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@JTravers

Just realized I can't look at the cleanup code as I don't have a subscription. Try the following to debug

Code:
def cleanup(self):
    res = self.browser.open('whatever the url was')
    print res.read()
kovidgoyal is offline   Reply With Quote
Old 12-04-2007, 12:37 AM   #107
JTravers
Groupie
JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.
 
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
Quote:
Originally Posted by kovidgoyal View Post
@JTravers
match_regexp works on the contents of the href attribute, i.e. the URL itself, not on the <a> tag.
Here's the code I'm using for the link regexp:
Code:
match_regexps = ['http://online.barrons.com/.*?html\?mod=.*?']
But I can see webpages being fetched from entirely different domains than barrons.com. I've attached my profile for Barrons. You should be able to test it (at your convenience, of course) without supplying a username and password, as there are some articles that are available to non-subscribers.
Attached Files
File Type: txt barrons.py.txt (3.6 KB, 417 views)
JTravers is offline   Reply With Quote
Advert
Old 12-04-2007, 01:12 AM   #108
JTravers
Groupie
JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.
 
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
Quote:
Originally Posted by kovidgoyal View Post
@JTravers

Just realized I can't look at the cleanup code as I don't have a subscription. Try the following to debug

Code:
def cleanup(self):
    res = self.browser.open('whatever the url was')
    print res.read()
Still hangs -- both when I login and when I don't. If you have the time to check, you should be able to test even without logging in. You can use my profile from the prior post.
JTravers is offline   Reply With Quote
Old 12-04-2007, 04:17 PM   #109
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,778
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Hmm another regression was preventing match_regexps from working. Fixed in svn. Note that in your case match regexps should be

match_regexps = ['http://online.barrons.com/.*?html\?mod=.*?|file://.*']

As for the cleanup hanging it seems to be following a long redirect chain

Use the following code to see the HTTP responses being sent by the server

Code:
def cleanup(self):
            try:
                self.browser.set_debug_responses(True)
                import sys, logging
                logger = logging.getLogger("mechanize")
                logger.addHandler(logging.StreamHandler(sys.stdout))
                logger.setLevel(logging.INFO)

                res = self.browser.open('http://online.barrons.com/logout')
            except:
                import traceback
                traceback.print_exc()
You may find the documentation at http://wwwsearch.sourceforge.net/mechanize/ useful for understanding how the browser object works.
kovidgoyal is offline   Reply With Quote
Old 12-04-2007, 04:41 PM   #110
JTravers
Groupie
JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.
 
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
Thanks for all of your help, Kovid.

I'll take a look at the code and link you recommended and see if I can come up with a solution.

Once that's all worked out, the profiles I made for WSJ.com and Barrons.com should be pretty much done.

I'll probably start working on other finance/investment sites after that. (The WSJ.com blogs should be pretty easy to implement -- and they're free, too!).
JTravers is offline   Reply With Quote
Advert
Old 12-05-2007, 03:21 AM   #111
JTravers
Groupie
JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.
 
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
Error

What does the following error mean?

Code:
Traceback (most recent call last):
  File "convert_from.py", line 187, in <module>
  File "convert_from.py", line 181, in main
  File "convert_from.py", line 123, in process_profile
  File "libprs500\ebooks\lrf\web\profiles\__init__.pyo", line 92, in __init__
  File "libprs500\ebooks\lrf\web\profiles\__init__.pyo", line 104, in build_index
  File "libprs500\ebooks\lrf\web\profiles\__init__.pyo", line 159, in parse_feeds
ValueError: too many values to unpack
I get it when trying to process the following feed:
http://feeds.portfolio.com/portfolio/businessspin

Thanks.
JTravers is offline   Reply With Quote
Old 12-05-2007, 03:39 AM   #112
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,778
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
That means the get_feeds function is not returning a correct sequence.
kovidgoyal is offline   Reply With Quote
Old 12-05-2007, 03:44 AM   #113
JTravers
Groupie
JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.
 
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
I'm trying to setup profiles for some full content feeds, in which I go no further than listing the articles with descriptions (since the descriptions in the feed contain the full content). However, I noticed that linked text in a feed description is removed.

I know html2lrf had a regression which removed linked text completely (which you have already fixed). So I thought maybe this was a regression, too. If not, perhaps you could set it up so that it just strips the links from the descriptions but keeps the text in place.

Thanks.
JTravers is offline   Reply With Quote
Old 12-05-2007, 03:47 AM   #114
JTravers
Groupie
JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.
 
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
Quote:
Originally Posted by kovidgoyal View Post
That means the get_feeds function is not returning a correct sequence.
User error on my part. I forgot a comma between the feed title and URL.
JTravers is offline   Reply With Quote
Old 12-05-2007, 11:53 AM   #115
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,778
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Can you give me an example of such a feed, so I can debug.
kovidgoyal is offline   Reply With Quote
Old 12-05-2007, 04:17 PM   #116
JTravers
Groupie
JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.
 
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
Quote:
Originally Posted by kovidgoyal View Post
Can you give me an example of such a feed, so I can debug.
Here's one from the profile I was working on.
http://feeds.portfolio.com/portfolio/businessspin

I've attached the lrf generated from the profile, so you can see the results.
Attached Files
File Type: lrf Portfolio [Wed, Dec 05, 2007].lrf (277.3 KB, 412 views)
JTravers is offline   Reply With Quote
Old 12-05-2007, 04:28 PM   #117
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,778
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Ah ok should be fixed in svn, let me know if if still gives you trouble.
kovidgoyal is offline   Reply With Quote
Old 12-05-2007, 10:09 PM   #118
JTravers
Groupie
JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.
 
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
max_recursions error

Whenever I set max_recursions to 0 or 1 in a profile, I get the following error after the lrf is generated:
Code:
Exception exceptions.WindowsError: WindowsError(32, 'The process cannot access 
the file because it is being used by another process') in <bound method Portfolio.__del__ of 
<portfolio.Portfolio object at 0x00FCFCF0>> ignored
If I then set max_recursions to 2 or more, the error goes away.

Last edited by JTravers; 12-05-2007 at 10:12 PM.
JTravers is offline   Reply With Quote
Old 12-05-2007, 10:35 PM   #119
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,778
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
That error can be safely ignored, all it means is that some temporary file was not deleted.
kovidgoyal is offline   Reply With Quote
Old 12-06-2007, 03:19 AM   #120
JTravers
Groupie
JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.
 
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
New Profiles

Just to let everyone know, I posted profiles for the Wall Street Journal, Barron's, and Portfolio.com on Kovid's wiki.
https://libprs500.kovidgoyal.net/wiki/UserProfiles

Subscribers to WSJ and Barron's should be able to get all the content using the --username and --password options in web2lrf. Non-subscribers will get the free articles only.

Be aware that because of the peculiarities of how concurrent logins are handled at the WSJ and Barron's sites, you may get locked out of your account for a short period of time using the WSJ and Barrons profiles. You would probably have to run the profiles (with login credentials) multiple times before this happens, though. So if you're only running it once within a reasonable period of time, you should be safe.
JTravers is offline   Reply With Quote
Reply

Tags
libprs500, web2lrf

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
web2lrf to capture blog archive? Deputy-Dawg Sony Reader Dev Corner 1 02-14-2008 11:41 PM
web2lrf: La Repubblica alexxxm Sony Reader 1 11-13-2007 12:27 PM


All times are GMT -4. The time now is 06:25 AM.


MobileRead.com is a privately owned, operated and funded community.