Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre


Thread Tools Search this Thread
Old 12-15-2009, 08:25 PM   #1
Junior Member
horsegoalie began at the beginning.
Posts: 9
Karma: 10
Join Date: Dec 2009
Device: Nook
Python/Calibre question

The website I am trying to parse has lists on it, and it is messing up my python script. What I get for a result is just the first item of what I thought I should receive. In the following example I get "Page one" properly, as well as "TextName1" properly. I do not get "TextName2" or "TextName3". I need to be able to decide whether the item is a sectionHeader or a linklist later. Thanks for any help!

The Python is:

for div in soup.findAll(True,
attrs={'class':['sectionHeader', 'linklist']}, recursive=True):

The basic structure of the HTML is:

<h3 class="sectionHeader">Page one</h3>
<ul class=linklist>
<li><a href="......">TextName1</a> <span class="attr">More Text</span></li>
<li><a href="......">TextName2</a> <span class="attr">More Text</span></li>
<li><a href="......">TextName3</a> <span class="attr">More Text</span></li>
horsegoalie is offline   Reply With Quote

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre custom news feed and python help. harrynewman Calibre 4 10-08-2009 09:26 AM
Python 2.5 and Calibre FizzyWater Calibre 1 03-27-2009 02:15 AM
Having some trouble with Calibre 0.4.109(python upgrade) ould Calibre 13 12-04-2008 03:28 PM
calibre python-lxml problem on ubuntu carpii Calibre 5 11-29-2008 05:34 AM
Calibre and Python: do they get along? zander_nyrond Calibre 7 07-20-2008 06:54 PM

All times are GMT -4. The time now is 12:14 PM. is a privately owned, operated and funded community.