Thread: web2lrf
View Single Post
Old 12-03-2007, 07:03 PM   #104
JTravers
Groupie
JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.
 
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
Quote:
Originally Posted by kovidgoyal View Post
Yeah that should do it, no need to return anything though.

Use
Code:
html2lrf_options = ['--ignore-tables']
When trying the cleanup code, web2lrf hangs right after generating the lrf. I used the following code:
Code:
        def cleanup(self): 
                self.browser.open('http://online.barrons.com/logout')
For Barron's, I have to set max recursions to 3 because there are some articles that are divided into two parts (even the print versions). Doing this, however, causes web2lrf to follow a bunch of other links which end up being garbage and taking it off the Barron's website. Is there a way to restrict the links that web2lrf follows? I've tried the following, but it didn't seem to work:

Code:
        match_regexps = ['<a.*?mod=.*?>']
and I also tried:
Code:
        match_regexps = ['<a.*?online.barrons.com.*?>']
It doesn't seem like either is having an effect. I know I'm probably misusing these options, so any guidance would be appreciated.

Finally, I tried using html2lrf_options before (and again now), and it doesn't seem to give the same output that is generated when specifying --ignore-tables on the command line. Not sure why.
JTravers is offline   Reply With Quote