MobileRead Forums - View Single Post - Can't extract article title in parse index

hiperlink · 12-17-2010, 11:09 AM

Hi All!

First of all please note, I'm not expert (python) programmer, and just started with Calibre recipes.

I want to create a recipe for a static site.

They does not provide RSS, so I had to use parse_index as I had figured out.

I was able to extract the article links from:

Code:

<!-- section title -->
<a href="/publicisztika" class="rovat">Publicisztika</a>

          <div class="separator"></div>
          <ul>
<li>KOVÁCS ZOLTÁN: <a href="/2010-12-08_vallunkra-helyezi-josagos-tenyeret">Vállunkra helyezi jóságos tenyerét</a></li>

<li> Megyesi Gusztáv: <a href="/2010-12-08_vissza-a-partpenzt">Vissza a pártpénzt</a></li>

<li>FALUSY ZSIGMOND: <a href="/2010-12-08_rgek">Ürgék</a></li>
<!-- some more li cut out -->
</ul>

Via this code:

Code:

            for post in section.findAll('li'):
                h = post.find('li')
                title = self.tag_to_string(h)
                self.log('\t * TITLE IS: ', title)
                a = post.find('a', href=True)
                url = a['href']

But for some reason the title is never set.
What I expect (or would like to get) is:

title: "KOVÁCS ZOLTÁN: Vállunkra helyezi jóságos tenyerét"
a: "/2010-12-08_vallunkra-helyezi-josagos-tenyeret"
title: "Megyesi Gusztáv: Vissza a pártpénzt"
a: "/2010-12-08_vissza-a-partpenzt"

Could someone please help me out how to do that?

Thanks in advance!

12-17-2010, 11:09 AM	#1
hiperlink Enthusiast Posts: 45 Karma: 10 Join Date: Dec 2010 Device: Kindle 3 Wifi only	Can't extract article title in parse index Hi All! First of all please note, I'm not expert (python) programmer, and just started with Calibre recipes. I want to create a recipe for a static site. They does not provide RSS, so I had to use parse_index as I had figured out. I was able to extract the article links from: Code: <!-- section title --> <a href="/publicisztika" class="rovat">Publicisztika</a> <div class="separator"></div> <ul> <li>KOVÁCS ZOLTÁN: <a href="/2010-12-08_vallunkra-helyezi-josagos-tenyeret">Vállunkra helyezi jóságos tenyerét</a></li> <li> Megyesi Gusztáv: <a href="/2010-12-08_vissza-a-partpenzt">Vissza a pártpénzt</a></li> <li>FALUSY ZSIGMOND: <a href="/2010-12-08_rgek">Ürgék</a></li> <!-- some more li cut out --> </ul> Via this code: Code: for post in section.findAll('li'): h = post.find('li') title = self.tag_to_string(h) self.log('\t * TITLE IS: ', title) a = post.find('a', href=True) url = a['href'] But for some reason the title is never set. What I expect (or would like to get) is: title: "KOVÁCS ZOLTÁN: Vállunkra helyezi jóságos tenyerét" a: "/2010-12-08_vallunkra-helyezi-josagos-tenyeret" title: "Megyesi Gusztáv: Vissza a pártpénzt" a: "/2010-12-08_vissza-a-partpenzt" Could someone please help me out how to do that? Thanks in advance!