MobileRead Forums - View Single Post - Fetching multi-page articles

lrui · 08-15-2012, 09:48 AM

Quote:

Originally Posted by Steven630

Thanks. Yet again, it failed.

After I started downloading, nothing indicated that Calibre had found "span" or "div" etc. I suspect this method won't work however hard we try. That is, it's not two class="a1" or other mistakes that led to the failure, but the method in the first place. (Yes, there are two class="a1", but what counts when you use find... in beautifulsoup is the first one. So the second class="a1" would be ignored when the first one is found.) And in theory at least, your method to find "span" and so on should work, but didn't. What do you think?

As for match_regexps, that didn't work either, although I'm not sure if simply adding "match_regexps" and "recursion" to the recipe is enough. Wait, seems that match_regexps is not for multi-page articles in the first place...

pagenum = soup.findAll('span')

change soup.find into soup.findAll

try it again?