MobileRead Forums - View Single Post

horsegoalie · 12-15-2009, 09:25 PM

The website I am trying to parse has lists on it, and it is messing up my python script. What I get for a result is just the first item of what I thought I should receive. In the following example I get "Page one" properly, as well as "TextName1" properly. I do not get "TextName2" or "TextName3". I need to be able to decide whether the item is a sectionHeader or a linklist later. Thanks for any help!

The Python is:

for div in soup.findAll(True,
attrs={'class':['sectionHeader', 'linklist']}, recursive=True):

The basic structure of the HTML is:

<h3 class="sectionHeader">Page one</h3>
<ul class=linklist>
<li><a href="......">TextName1</a> <span class="attr">More Text</span></li>
<li><a href="......">TextName2</a> <span class="attr">More Text</span></li>
<li><a href="......">TextName3</a> <span class="attr">More Text</span></li>
</ul>

12-15-2009, 09:25 PM	#1
horsegoalie Junior Member Posts: 9 Karma: 10 Join Date: Dec 2009 Device: Nook	Python/Calibre question The website I am trying to parse has lists on it, and it is messing up my python script. What I get for a result is just the first item of what I thought I should receive. In the following example I get "Page one" properly, as well as "TextName1" properly. I do not get "TextName2" or "TextName3". I need to be able to decide whether the item is a sectionHeader or a linklist later. Thanks for any help! The Python is: for div in soup.findAll(True, attrs={'class':['sectionHeader', 'linklist']}, recursive=True): The basic structure of the HTML is: <h3 class="sectionHeader">Page one</h3> <ul class=linklist> <li><a href="......">TextName1</a> <span class="attr">More Text</span></li> <li><a href="......">TextName2</a> <span class="attr">More Text</span></li> <li><a href="......">TextName3</a> <span class="attr">More Text</span></li> </ul>