Quote:
Originally Posted by gambarini
http://bugs.calibre-ebook.com/wiki/recipeGuide_advanced
in this link a can find an example of parse_index, and is a good method to create a feed, a complete list of article.
So, now i try to use the parse index in two different way:
-) to override only the title (because lack in the feed, and because the other are correct (description, url, date)).
-) to create a complete feed with all real first page of newspaper.
the second way now is clear, but the first actualy not at all.
|
So what have you tried? The page you reference explains how parse_index works. You create your own set of feeds. Each feed has a title and a set of articles. The set of feeds is created in the line:
feeds.append((title, articles))
of parse_index. The "title" there is the feed title.
The articles for each feed are created in nz_parse_section of the example in this line:
current_articles.append({'title': title, 'url': url, 'description':'', 'date':''})
The "title" there is the article title.
It appears you want to control the article titles, not the feed title. I'd do it this way:
First, I'd use parse_index to process each RSS feed I want (you may only need one). Parse_index will treat each RSS feed page as a web page. You can grab what you want from that page using BeautifulSoup. I'd use a modified version of nz_parse_section to find each {'title': title, 'url': url, 'description':'', 'date':''} for each article on the page being processed. As I grab that data for each article, I'd test the title to see if it's what I want to appear. You said they are usually OK. If they aren't OK, you'll need to either create a title, if you can, or go to the URL and get a title from that page (again, BeautifulSoup is used to grab the info you want). Once you are happy with the data for the article, you append it to the current_articles list.
When you're done with the page, it returns to parse_index and your titles will be as you want them.
It sounds like a lot of trouble, but I don't see any other way to do it.