Quote:
Originally Posted by kovidgoyal
Yeah that should do it, no need to return anything though.
Use
Code:
html2lrf_options = ['--ignore-tables']
|
When trying the cleanup code, web2lrf hangs right after generating the lrf. I used the following code:
Code:
def cleanup(self):
self.browser.open('http://online.barrons.com/logout')
For Barron's, I have to set max recursions to 3 because there are some articles that are divided into two parts (even the print versions). Doing this, however, causes web2lrf to follow a bunch of other links which end up being garbage and taking it off the Barron's website. Is there a way to restrict the links that web2lrf follows? I've tried the following, but it didn't seem to work:
Code:
match_regexps = ['<a.*?mod=.*?>']
and I also tried:
Code:
match_regexps = ['<a.*?online.barrons.com.*?>']
It doesn't seem like either is having an effect. I know I'm probably misusing these options, so any guidance would be appreciated.
Finally, I tried using
html2lrf_options before (and again now), and it doesn't seem to give the same output that is generated when specifying
--ignore-tables on the command line. Not sure why.