Hi All!
First of all please note, I'm not expert (python) programmer, and just started with Calibre recipes.
I want to create a recipe for a
static site.
They does not provide RSS, so I had to use
parse_index as I had figured out.
I was able to extract the article links from:
Code:
<!-- section title -->
<a href="/publicisztika" class="rovat">Publicisztika</a>
<div class="separator"></div>
<ul>
<li>KOVÁCS ZOLTÁN: <a href="/2010-12-08_vallunkra-helyezi-josagos-tenyeret">Vállunkra helyezi jóságos tenyerét</a></li>
<li> Megyesi Gusztáv: <a href="/2010-12-08_vissza-a-partpenzt">Vissza a pártpénzt</a></li>
<li>FALUSY ZSIGMOND: <a href="/2010-12-08_rgek">Ürgék</a></li>
<!-- some more li cut out -->
</ul>
Via this code:
Code:
for post in section.findAll('li'):
h = post.find('li')
title = self.tag_to_string(h)
self.log('\t * TITLE IS: ', title)
a = post.find('a', href=True)
url = a['href']
But for some reason the title is never set.
What I expect (or would like to get) is:
- title: "KOVÁCS ZOLTÁN: Vállunkra helyezi jóságos tenyerét"
- a: "/2010-12-08_vallunkra-helyezi-josagos-tenyeret"
- title: "Megyesi Gusztáv: Vissza a pártpénzt"
- a: "/2010-12-08_vissza-a-partpenzt"
Could someone please help me out how to do that?
Thanks in advance!