View Single Post
Old 06-01-2010, 10:20 AM   #2019
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by gambarini View Post
http://bugs.calibre-ebook.com/wiki/recipeGuide_advanced

in this link a can find an example of parse_index, and is a good method to create a feed, a complete list of article.
So, now i try to use the parse index in two different way:

-) to override only the title (because lack in the feed, and because the other are correct (description, url, date)).
-) to create a complete feed with all real first page of newspaper.

the second way now is clear, but the first actualy not at all.
So what have you tried? The page you reference explains how parse_index works. You create your own set of feeds. Each feed has a title and a set of articles. The set of feeds is created in the line:
feeds.append((title, articles))
of parse_index. The "title" there is the feed title.

The articles for each feed are created in nz_parse_section of the example in this line:

current_articles.append({'title': title, 'url': url, 'description':'', 'date':''})

The "title" there is the article title.

It appears you want to control the article titles, not the feed title. I'd do it this way:

First, I'd use parse_index to process each RSS feed I want (you may only need one). Parse_index will treat each RSS feed page as a web page. You can grab what you want from that page using BeautifulSoup. I'd use a modified version of nz_parse_section to find each {'title': title, 'url': url, 'description':'', 'date':''} for each article on the page being processed. As I grab that data for each article, I'd test the title to see if it's what I want to appear. You said they are usually OK. If they aren't OK, you'll need to either create a title, if you can, or go to the URL and get a title from that page (again, BeautifulSoup is used to grab the info you want). Once you are happy with the data for the article, you append it to the current_articles list.

When you're done with the page, it returns to parse_index and your titles will be as you want them.

It sounds like a lot of trouble, but I don't see any other way to do it.
Starson17 is offline