05-31-2010, 07:34 PM | #2011 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
New recipe for Bosnian portal sarajevo-x.com:
|
05-31-2010, 10:30 PM | #2012 |
Zealot
Posts: 125
Karma: 314
Join Date: Apr 2010
Location: Canada, Eh!
Device: Kobo
|
Are there any recipes specifically for the FIFA 2010 World Cup feeds? A couple on fifa.com that would be nice are:
Latest News: http://www.fifa.com/rss/index.xml 2010 FIFA World Cup South Africa: http://www.fifa.com/worldcup/news/rss.xml |
05-31-2010, 10:54 PM | #2013 |
Enthusiast
Posts: 49
Karma: 10
Join Date: Aug 2009
Device: none
|
can anyone help me with the recipe for this magazine
http://www.foodprocessing360.com/ind...ate=12/05/2009 |
05-31-2010, 11:53 PM | #2014 | |
Junior Member
Posts: 3
Karma: 10
Join Date: May 2010
Location: Calgary, AB, Canada
Device: iPad
|
Quote:
I am pretty wimpy when it comes to python. I can sed and perl pretty well, can do a bit with awk, but python just makes my brain hurt. That's part of why I like sitescooper so much and the simplicity of their .site files. Thanks for your suggestion, though! Cheers! |
|
06-01-2010, 12:02 AM | #2015 | |
Junior Member
Posts: 3
Karma: 10
Join Date: May 2010
Location: Calgary, AB, Canada
Device: iPad
|
Quote:
Cheers! |
|
06-01-2010, 02:09 AM | #2016 |
Enthusiast
Posts: 33
Karma: 10
Join Date: May 2010
Device: Bookeen Cybook Gen3 Gold
|
|
06-01-2010, 09:51 AM | #2017 |
Connoisseur
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
http://bugs.calibre-ebook.com/wiki/recipeGuide_advanced
in this link a can find an example of parse_index, and is a good method to create a feed, a complete list of article. So, now i try to use the parse index in two different way: -) to override only the title (because lack in the feed, and because the other are correct (description, url, date)). -) to create a complete feed with all real first page of newspaper. the second way now is clear, but the first actualy not at all. |
06-01-2010, 09:56 AM | #2018 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Clearly, I wasn't clear that what you wrote wasn't clear to me. To clear things up, I have to ask you to be more clear. I'm sure that it is now clear that your thanks are premature.
(To rephrase the above: Why don't you repost your questions, in greater detail, if you still have any. I really couldn't figure out what help you were asking for.) Edit: I see you did that .. while I was writing my comment. Last edited by Starson17; 06-01-2010 at 09:58 AM. |
06-01-2010, 10:20 AM | #2019 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
feeds.append((title, articles)) of parse_index. The "title" there is the feed title. The articles for each feed are created in nz_parse_section of the example in this line: current_articles.append({'title': title, 'url': url, 'description':'', 'date':''}) The "title" there is the article title. It appears you want to control the article titles, not the feed title. I'd do it this way: First, I'd use parse_index to process each RSS feed I want (you may only need one). Parse_index will treat each RSS feed page as a web page. You can grab what you want from that page using BeautifulSoup. I'd use a modified version of nz_parse_section to find each {'title': title, 'url': url, 'description':'', 'date':''} for each article on the page being processed. As I grab that data for each article, I'd test the title to see if it's what I want to appear. You said they are usually OK. If they aren't OK, you'll need to either create a title, if you can, or go to the URL and get a title from that page (again, BeautifulSoup is used to grab the info you want). Once you are happy with the data for the article, you append it to the current_articles list. When you're done with the page, it returns to parse_index and your titles will be as you want them. It sounds like a lot of trouble, but I don't see any other way to do it. |
|
06-01-2010, 10:30 AM | #2020 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
|
|
06-01-2010, 05:55 PM | #2021 | |
Addict
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Quote:
Newsweek: Bizarre - Guy stung in rear by numerous bees ends up harboring a honeycomb in his rectum. and then it link me to some story.... I'll look at the RSS feed that is probably what i need anyway. thanks for the help though. |
|
06-01-2010, 09:59 PM | #2022 | |
Member
Posts: 16
Karma: 10
Join Date: May 2010
Location: Southern California
Device: JetBook-Lite
|
Quote:
I have 2 site that I'm tiring to get the multi-page code working on, pcper.com and tweaktown.com. Both these sites have similar layouts thou tweaktown.com source code seems a bit better to learn with, so I've been workin with that one. I'm kinda stuck, when I add the append_page code the test html only contains the feed description and date, with out it I get the 1st page so I'm screwing it up somewhere. here's what I have for tweaktown.com: Code:
class AdvancedUserRecipe1273795663(BasicNewsRecipe): title = u'TweakTown Latest Tech' description = 'TweakTown Latest Tech' __author__ = 'KidTwisted' publisher = 'TweakTown' category = 'PC Articles, Reviews and Guides' use_embedded_content = False max_articles_per_feed = 1 oldest_article = 7 timefmt = ' [%Y %b %d ]' no_stylesheets = True language = 'en' #recursion = 10 remove_javascript = True conversion_options = { 'linearize_tables' : True} # reverse_article_order = True #INDEX = u'http://www.tweaktown.com' html2lrf_options = [ '--comment', description , '--category', category , '--publisher', publisher ] html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"' keep_only_tags = [dict(name='div', attrs={'id':['article']})] feeds = [ (u'Articles Reviews', u'http://feeds.feedburner.com/TweaktownArticlesReviewsAndGuidesRss20?format=xml') ] def get_article_url(self, article): return article.get('guid', None) def append_page(self, soup, appendtag, position): pager = soup.find('a',attrs={'class':'next'}) if pager: nexturl = pager.a['href'] soup2 = self.index_to_soup(nexturl) texttag = soup2.find('div', attrs={'id':'article'}) for it in texttag.findAll(style=True): del it['style'] newpos = len(texttag.contents) self.append_page(soup2,texttag,newpos) texttag.extract() appendtag.insert(position,texttag) def preprocess_html(self, soup): mtag = '<meta http-equiv="Content-Language" content="en-US"/>\n<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>' soup.head.insert(0,mtag) for item in soup.findAll(style=True): del item['style'] self.append_page(soup, soup.body, 3) pager = soup.find('a',attrs={'class':'next'}) if pager: pager.extract() return soup Last edited by kidtwisted; 06-01-2010 at 11:36 PM. |
|
06-01-2010, 11:17 PM | #2023 |
Member
Posts: 16
Karma: 10
Join Date: May 2010
Location: Southern California
Device: JetBook-Lite
|
Just a side thought to my previous post, both of those site use Article index drop down boxes that contain links to all the pages of the article.
example source code from pcper.com: Code:
<form method="post" action="/article.php"> <b>Review Index:</b><br> <select style="font-size: 75%;" onchange="location.href=form.url.options[form.url.selectedIndex].value" name="url"> <option select=""> - Select - </option> <option value="article.php?aid=926&type=expert&pid=1" select="">A complete lineup</option> <option value="article.php?aid=926&type=expert&pid=2" select="">FirePro V7800 and V4800 Cards</option> <option value="article.php?aid=926&type=expert&pid=3" select="">Testing Methodology, System Setup and CineBench 11/10</option> <option value="article.php?aid=926&type=expert&pid=4" select="">SPECviewperf 10</option> <option value="article.php?aid=926&type=expert&pid=5" select="">SPECviewperf 10 - Multisample Testing</option> <option value="article.php?aid=926&type=expert&pid=6" select="">SPECviewperf 10 - Multithreaded testing</option> <option value="article.php?aid=926&type=expert&pid=7" select="">3DMark Vantage</option> <option value="article.php?aid=926&type=expert&pid=8" select="">Power Consumption and Conclusions</option> </select> </form> |
06-02-2010, 03:07 AM | #2024 | |
Connoisseur
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
Quote:
first i must process the feed, and try to find title, description,date,url and then use these values to override the "calibre" automatic value. it is not so simple (for me) to understand the correct way to do that and the correct sequence for every step of the process. I am not so familiar with object oriented language... Create a whole new feed, actually, for me, it's more clear in my mind. edit the nzeharld don't work Last edited by gambarini; 06-02-2010 at 05:01 AM. |
|
06-02-2010, 04:48 AM | #2025 | |
Connoisseur
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
Quote:
Code:
from calibre.web.feeds.news import BasicNewsRecipe class LaStampaParseIndex(BasicNewsRecipe): title = u'Debug Parse Index' cover_url = 'http://www.lastampa.it/edicola/PDF/1.pdf' remove_javascript = True no_stylesheets = True def nz_parse_section(self, url): soup = self.index_to_soup(url) head = soup.find(attrs= {'class': 'entry'}) descr = soup.find(attrs= {'class': 'feedEntryConteny'}) dt = soup.find(attrs= {'class': 'lastUpdated'}) current_articles = [] a = head.find('a', href = True) title = self.tag_to_string(a) url = a.get('href', False) description = self.tag_to_string(descr) date = self.tag_to_string(dt) self.log('title ', title) self.log('url ', url) self.log('description ', description) self.log('date ', date) current_articles.append({'title': title, 'url': url, 'description':description, 'date':date}) return current_articles keep_only_tags = [dict(attrs={'class':['boxocchiello2','titoloRub','titologir','catenaccio','sezione','articologirata']}), dict(name='div', attrs={'id':'corpoarticolo'}) ] remove_tags = [dict(name='div', attrs={'id':'menutop'}), dict(name='div', attrs={'id':'fwnetblocco'}), dict(name='table', attrs={'id':'strumenti'}), dict(name='table', attrs={'id':'imgesterna'}), dict(name='a', attrs={'class':'linkblu'}), dict(name='a', attrs={'class':'link'}), dict(name='span', attrs={'class':['boxocchiello','boxocchiello2','sezione']}) ] def parse_index(self): feeds = [] for title, url in [(u'Politica', u'http://www.lastampa.it/redazione/cmssezioni/politica/rss_politica.xml'), (u'Torino', u'http://rss.feedsportal.com/c/32418/f/466938/index.rss') ]: articles = self.nz_parse_section(url) if articles: feeds.append((title, articles)) return feeds |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |