MobileRead Forums - View Single Post

JTravers · 12-03-2007, 08:03 PM

Quote:

Originally Posted by kovidgoyal

Yeah that should do it, no need to return anything though.

Use

Code:

html2lrf_options = ['--ignore-tables']

When trying the cleanup code, web2lrf hangs right after generating the lrf. I used the following code:

Code:

        def cleanup(self): 
                self.browser.open('http://online.barrons.com/logout')

For Barron's, I have to set max recursions to 3 because there are some articles that are divided into two parts (even the print versions). Doing this, however, causes web2lrf to follow a bunch of other links which end up being garbage and taking it off the Barron's website. Is there a way to restrict the links that web2lrf follows? I've tried the following, but it didn't seem to work:

Code:

        match_regexps = ['<a.*?mod=.*?>']

and I also tried:

Code:

        match_regexps = ['<a.*?online.barrons.com.*?>']

It doesn't seem like either is having an effect. I know I'm probably misusing these options, so any guidance would be appreciated.

Finally, I tried using html2lrf_options before (and again now), and it doesn't seem to give the same output that is generated when specifying --ignore-tables on the command line. Not sure why.