|  12-03-2007, 08:27 PM | #106 | 
| creator of calibre            Posts: 45,600 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			@JTravers Just realized I can't look at the cleanup code as I don't have a subscription. Try the following to debug Code: def cleanup(self):
    res = self.browser.open('whatever the url was')
    print res.read() | 
|   |   | 
|  12-04-2007, 12:37 AM | #107 | |
| Groupie            Posts: 182 Karma: 1078201 Join Date: Sep 2007 Device: iPad Air 2 | Quote: 
 Code: match_regexps = ['http://online.barrons.com/.*?html\?mod=.*?'] | |
|   |   | 
|  12-04-2007, 01:12 AM | #108 | 
| Groupie            Posts: 182 Karma: 1078201 Join Date: Sep 2007 Device: iPad Air 2 | 
			
			Still hangs -- both when I login and when I don't. If you have the time to check, you should be able to test even without logging in. You can use my profile from the prior post.
		 | 
|   |   | 
|  12-04-2007, 04:17 PM | #109 | 
| creator of calibre            Posts: 45,600 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			Hmm another regression was preventing match_regexps from working. Fixed in svn. Note that in your case match regexps should be match_regexps = ['http://online.barrons.com/.*?html\?mod=.*?|file://.*'] As for the cleanup hanging it seems to be following a long redirect chain Use the following code to see the HTTP responses being sent by the server Code: def cleanup(self):
            try:
                self.browser.set_debug_responses(True)
                import sys, logging
                logger = logging.getLogger("mechanize")
                logger.addHandler(logging.StreamHandler(sys.stdout))
                logger.setLevel(logging.INFO)
                res = self.browser.open('http://online.barrons.com/logout')
            except:
                import traceback
                traceback.print_exc() | 
|   |   | 
|  12-04-2007, 04:41 PM | #110 | 
| Groupie            Posts: 182 Karma: 1078201 Join Date: Sep 2007 Device: iPad Air 2 | 
			
			Thanks for all of your help, Kovid. I'll take a look at the code and link you recommended and see if I can come up with a solution. Once that's all worked out, the profiles I made for WSJ.com and Barrons.com should be pretty much done. I'll probably start working on other finance/investment sites after that. (The WSJ.com blogs should be pretty easy to implement -- and they're free, too!). | 
|   |   | 
|  12-05-2007, 03:21 AM | #111 | 
| Groupie            Posts: 182 Karma: 1078201 Join Date: Sep 2007 Device: iPad Air 2 | 
				
				Error
			 
			
			What does the following error mean? Code: Traceback (most recent call last): File "convert_from.py", line 187, in <module> File "convert_from.py", line 181, in main File "convert_from.py", line 123, in process_profile File "libprs500\ebooks\lrf\web\profiles\__init__.pyo", line 92, in __init__ File "libprs500\ebooks\lrf\web\profiles\__init__.pyo", line 104, in build_index File "libprs500\ebooks\lrf\web\profiles\__init__.pyo", line 159, in parse_feeds ValueError: too many values to unpack http://feeds.portfolio.com/portfolio/businessspin Thanks. | 
|   |   | 
|  12-05-2007, 03:39 AM | #112 | 
| creator of calibre            Posts: 45,600 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			That means the get_feeds function is not returning a correct sequence.
		 | 
|   |   | 
|  12-05-2007, 03:44 AM | #113 | 
| Groupie            Posts: 182 Karma: 1078201 Join Date: Sep 2007 Device: iPad Air 2 | 
			
			I'm trying to setup profiles for some full content feeds, in which I go no further than listing the articles with descriptions (since the descriptions in the feed contain the full content). However, I noticed that linked text in a feed description is removed.  I know html2lrf had a regression which removed linked text completely (which you have already fixed). So I thought maybe this was a regression, too. If not, perhaps you could set it up so that it just strips the links from the descriptions but keeps the text in place. Thanks. | 
|   |   | 
|  12-05-2007, 03:47 AM | #114 | 
| Groupie            Posts: 182 Karma: 1078201 Join Date: Sep 2007 Device: iPad Air 2 | |
|   |   | 
|  12-05-2007, 11:53 AM | #115 | 
| creator of calibre            Posts: 45,600 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			Can you give me an example of such a feed, so I can debug.
		 | 
|   |   | 
|  12-05-2007, 04:17 PM | #116 | 
| Groupie            Posts: 182 Karma: 1078201 Join Date: Sep 2007 Device: iPad Air 2 | 
			
			Here's one from the profile I was working on. http://feeds.portfolio.com/portfolio/businessspin I've attached the lrf generated from the profile, so you can see the results. | 
|   |   | 
|  12-05-2007, 04:28 PM | #117 | 
| creator of calibre            Posts: 45,600 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			Ah ok should be fixed in svn, let me know if if still gives you trouble.
		 | 
|   |   | 
|  12-05-2007, 10:09 PM | #118 | 
| Groupie            Posts: 182 Karma: 1078201 Join Date: Sep 2007 Device: iPad Air 2 | 
				
				max_recursions error
			 
			
			Whenever I set max_recursions to 0 or 1 in a profile, I get the following error after the lrf is generated: Code: Exception exceptions.WindowsError: WindowsError(32, 'The process cannot access the file because it is being used by another process') in <bound method Portfolio.__del__ of <portfolio.Portfolio object at 0x00FCFCF0>> ignored Last edited by JTravers; 12-05-2007 at 10:12 PM. | 
|   |   | 
|  12-05-2007, 10:35 PM | #119 | 
| creator of calibre            Posts: 45,600 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			That error can be safely ignored, all it means is that some temporary file was not deleted.
		 | 
|   |   | 
|  12-06-2007, 03:19 AM | #120 | 
| Groupie            Posts: 182 Karma: 1078201 Join Date: Sep 2007 Device: iPad Air 2 | 
				
				New Profiles
			 
			
			Just to let everyone know, I posted profiles for the Wall Street Journal, Barron's, and Portfolio.com on Kovid's wiki. https://libprs500.kovidgoyal.net/wiki/UserProfiles Subscribers to WSJ and Barron's should be able to get all the content using the --username and --password options in web2lrf. Non-subscribers will get the free articles only. Be aware that because of the peculiarities of how concurrent logins are handled at the WSJ and Barron's sites, you may get locked out of your account for a short period of time using the WSJ and Barrons profiles. You would probably have to run the profiles (with login credentials) multiple times before this happens, though. So if you're only running it once within a reasonable period of time, you should be safe. | 
|   |   | 
|  | 
| Tags | 
| libprs500, web2lrf | 
| Thread Tools | Search this Thread | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| web2lrf to capture blog archive? | Deputy-Dawg | Sony Reader Dev Corner | 1 | 02-14-2008 11:41 PM | 
| web2lrf: La Repubblica | alexxxm | Sony Reader | 1 | 11-13-2007 12:27 PM |