View Single Post
Old 12-17-2010, 10:09 AM   #1
hiperlink
Enthusiast
hiperlink began at the beginning.
 
Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
Can't extract article title in parse index

Hi All!

First of all please note, I'm not expert (python) programmer, and just started with Calibre recipes.

I want to create a recipe for a static site.

They does not provide RSS, so I had to use parse_index as I had figured out.

I was able to extract the article links from:

Code:
<!-- section title -->
<a href="/publicisztika" class="rovat">Publicisztika</a>

          <div class="separator"></div>
          <ul>
<li>KOVÁCS ZOLTÁN: <a href="/2010-12-08_vallunkra-helyezi-josagos-tenyeret">Vállunkra helyezi jóságos tenyerét</a></li>

<li> Megyesi Gusztáv: <a href="/2010-12-08_vissza-a-partpenzt">Vissza a pártpénzt</a></li>

<li>FALUSY ZSIGMOND: <a href="/2010-12-08_rgek">Ürgék</a></li>
<!-- some more li cut out -->
</ul>
Via this code:

Code:
            for post in section.findAll('li'):
                h = post.find('li')
                title = self.tag_to_string(h)
                self.log('\t * TITLE IS: ', title)
                a = post.find('a', href=True)
                url = a['href']
But for some reason the title is never set.
What I expect (or would like to get) is:
  • title: "KOVÁCS ZOLTÁN: Vállunkra helyezi jóságos tenyerét"
  • a: "/2010-12-08_vallunkra-helyezi-josagos-tenyeret"
  • title: "Megyesi Gusztáv: Vissza a pártpénzt"
  • a: "/2010-12-08_vissza-a-partpenzt"

Could someone please help me out how to do that?

Thanks in advance!
hiperlink is offline   Reply With Quote