Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-16-2010, 09:53 AM   #1
Nexus
Member
Nexus began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Nov 2010
Location: France
Device: PRS-600
Recipes for RDS.ca, TSN.ca and TheHockeynews.com

Hey guys and gals,

I thought I would sort it out with all the FAQs and great tutorials around here (I've read a lot before posting), but I just can't build my own recipes once it gets a bit complicated. I've been working on these 3 sites for a couple of days, and I just don't find a way to retrieve the news correctly. If someone could help me, I'd be forever obliged. Thanks in advance.

http://www.rds.ca/hockey/fildepresse_rds.xml

http://www.tsn.ca/datafiles/rss/Stories.xml

http://www.thehockeynews.com/rss/all_categories.xml
Nexus is offline   Reply With Quote
Old 11-17-2010, 01:09 AM   #2
somedayson
Member
somedayson began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Sep 2010
Device: K3
Not much help, but I use this to get the daily hockey headlines from thn:

class AdvancedUserRecipe1283848394(BasicNewsRecipe):
title = u'Hockey News'
oldest_article = 1
max_articles_per_feed = 100

feeds = [(u'Hockey News', u'http://www.thehockeynews.com/rss/9-Headlines.xml')][/QUOTE][/QUOTE]
somedayson is offline   Reply With Quote
Advert
Old 11-17-2010, 11:35 AM   #3
Nexus
Member
Nexus began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Nov 2010
Location: France
Device: PRS-600
Thanks somedayson, for whatever reason the RSS link I tried the other day for THN didn't work. It's working fine now. I've worked a bit on and I got some results, except that I have big time difficulties to remove tags that come after the article, I tried "stuff" like remove_tags_after/before etc, with no success unfortunately.

Here's the recipe, not looking pretty I know, but that's all I could come up with with my knowledge. You have no idea how long it took me...

Code:
class AdvancedUserRecipe1289990851(BasicNewsRecipe):
    title          = u'THE HOCKEY NEWS'
    oldest_article = 7
    max_articles_per_feed = 5
    no_stylesheets = True
    remove_tags = [dict(name='div', attrs={'class':'article_info'}),
                            dict(name='div', attrs={'class':'photo_details'}),
                            dict(name='div', attrs={'id':'comments_container'}),		  
                            dict(name='div', attrs={'id':'add_comment'}),
			    dict(name='div', attrs={'id':'legal_info'}),
		            dict(name='div', attrs={'id':'breadcrumb'}),
   			    dict(name='div', attrs={'id':'site_header'}),
			    dict(name='div', attrs={'id':'site_navigation'}),
			    dict(name='div', attrs={'id':'advertisement'}),
			    dict(name='div', attrs={'class':'tool_menu'})]
				   
    feeds          = [(u'THN', u'http://www.thehockeynews.com/rss/all_categories.xml')]
Nexus is offline   Reply With Quote
Old 11-17-2010, 12:11 PM   #4
Nexus
Member
Nexus began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Nov 2010
Location: France
Device: PRS-600
By the way here's the error message I get for TSN (I used the basic recipe)

Spoiler:
Quote:
Parsing index.html ...
Parsing feed_0/article_4/index.html ...
Referenced file '/nhl/teams/players/%3fname%3djordan%2bleopold' not found
Referenced file '/nhl/teams/players/%3fname%3dviktor%2bstalberg' not found
Referenced file '/nhl/teams/players/%3fname%3dalexander%2bmogilny' not found
Referenced file '/nhl/teams/players/%3fname%3dtyler%2bmyers' not found
Referenced file '/tsn_fantasy' not found
Referenced file '/nhl/teams/players/%3fname%3djay%2bbouwmeester' not found
Referenced file '/nhl/teams/players/%3fname%3dalexander%2bedler' not found
Referenced file '/nhl/teams/players/%3fname%3ddaniel%2bsedin' not found
Referenced file '/nhl/teams/players/%3fname%3dfrancis%2bbouillon' not found
Referenced file '/nhl/teams/players/%3fname%3drandy%2bjones' not found
Referenced file '/nhl/teams/players/%3fname%3dsean%2bavery' not found
Referenced file '/nhl/teams/players/%3fname%3dray%2bwhitney' not found
Referenced file '/nhl/teams/players/%3fname%3dcory%2bschneider' not found
Referenced file '/nhl/teams/players/%3fname%3dtheo%2bpeckham' not found
Referenced file '/nhl/teams/players/%3fname%3dbrett%2bclark' not found
Referenced file '/nhl/teams/players/%3fname%3dcam%2bward' not found
Referenced file '/nhl/teams/players/%3fname%3ddaniel%2balfredsson' not found
Referenced file '/nhl/teams/players/%3fname%3djonas%2bholos' not found
Referenced file '/nhl/teams/players/%3fname%3dmarc%2bstaal' not found
Referenced file '/nhl/teams/players/%3fname%3derik%2bkarlsson' not found
Referenced file '/nhl/teams/players/%3fname%3dalex%2bkovalev' not found
Referenced file '/nhl/teams/players/%3fname%3dmarc%2bmethot' not found
Referenced file '/nhl/teams/players/%3fname%3dryan%2bjones' not found
Referenced file '/nhl/teams/players/%3fname%3dmikael%2bsamuelsson' not found
Referenced file '/nhl/teams/players/%3fname%3dkris%2bletang' not found
Referenced file '/nhl/teams/players/%3fname%3dmarc-andre%2bfleury' not found
Referenced file '/nhl/teams/players/%3fname%3dmark%2bgiordano' not found
Referenced file '/nhl/teams/players/%3fname%3dduncan%2bkeith' not found
Referenced file '/nhl/teams/players/%3fname%3ddustin%2bbyfuglien' not found
Referenced file '/nhl/teams/players/%3fname%3dtomas%2bplekanec' not found
Referenced file '/nhl/teams/players/%3fname%3dladislav%2bsmid' not found
Referenced file '/fantasy_news' not found
Referenced file '/nhl/teams/players/%3fname%3dcolin%2bfraser' not found
Referenced file '/nhl/teams/players/%3fname%3dhenrik%2blundqvist' not found
Referenced file '/nhl/teams/players/%3fname%3dryan%2bcallahan' not found
Referenced file '/nhl/teams/players/%3fname%3dvictor%2bhedman' not found
Referenced file '/nhl/teams/players/%3fname%3ddan%2bhamhuis' not found
Referenced file '/nhl/teams/players/%3fname%3dchris%2bkunitz' not found
Referenced file '/nhl/teams/players/%3fname%3dshawn%2bhorcoff' not found
Referenced file '/nhl/teams/players/%3fname%3dsergei%2bfedorov' not found
Referenced file '/twitter' not found
Referenced file '/nhl/teams/players/%3fname%3dmatt%2bcooke' not found
Referenced file '/nhl/teams/players/%3fname%3dtoni%2blydman' not found
Referenced file '/nhl/teams/players/%3fname%3djarome%2biginla' not found
Referenced file '/nhl/teams/players/%3fname%3ddarroll%2bpowe' not found
Referenced file '/nhl/teams/players/%3fname%3dmike%2bcammalleri' not found
Referenced file '/nhl/teams/players/%3fname%3dluke%2brichardson' not found
Referenced file '/nhl/teams/players/%3fname%3dmike%2bweaver' not found
Referenced file '/nhl/teams/players/%3fname%3dbrandon%2bdubinsky' not found
Referenced file '/nhl/teams/players/%3fname%3droberto%2bluongo' not found
Referenced file '/nhl/teams/players/%3fname%3dhenrik%2bsedin' not found
Referenced file 'feed_1/index.html' not found
Referenced file '/nhl/teams/players/%3fname%3dmiikka%2bkiprusoff' not found
Referenced file '/nhl/teams/players/%3fname%3dmike%2brichards' not found
Referenced file '/nhl/teams/players/%3fname%3dsteve%2bmontador' not found
Referenced file '/nhl/teams/players/%3fname%3dsidney%2bcrosby' not found
Referenced file '/nhl/teams/players/%3fname%3dtom%2bkostopoulos' not found
Referenced file '/nhl/teams/players/%3fname%3dvernon%2bfiddler' not found
Referenced file '/nhl/teams/players/%3fname%3djeff%2bhalpern' not found
Referenced file '/nhl/teams/players/%3fname%3danton%2bbabchuk' not found
Referenced file '/nhl/teams/players/%3fname%3djason%2bspezza' not found
Referenced file '/nhl/teams/players/%3fname%3dbrian%2belliott' not found
Referenced file '/nhl/teams/players/%3fname%3dtyler%2bkennedy' not found
Referenced file '/nhl/teams/players/%3fname%3dnikolai%2bkhabibulin' not found
Reading TOC from NCX...
Merging user specified metadata...
Detecting structure...
Flattening CSS and remapping font sizes...
Python function terminated unexpectedly
(Error Code: 1)
Traceback (most recent call last):
File "site.py", line 103, in main
File "site.py", line 85, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 90, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 21, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 816, in run
File "site-packages\calibre\ebooks\oeb\transforms\flatcss.py" , line 122, in __call__
File "site-packages\calibre\ebooks\oeb\transforms\flatcss.py" , line 146, in stylize_spine
File "site-packages\calibre\ebooks\oeb\stylizer.py", line 173, in __init__
File "site-packages\calibre\ebooks\oeb\stylizer.py", line 96, in __init__
File "site-packages\lxml-2.2.2-py2.6-win32.egg\lxml\cssselect.py", line 522, in css_to_xpath
File "site-packages\lxml-2.2.2-py2.6-win32.egg\lxml\cssselect.py", line 476, in xpath
File "site-packages\lxml-2.2.2-py2.6-win32.egg\lxml\cssselect.py", line 247, in xpath
File "site-packages\lxml-2.2.2-py2.6-win32.egg\lxml\cssselect.py", line 257, in _xpath_root
NotImplementedError


For RDS, I may have an idea, will look into it. Anyway, thanks in advance for the help.

Last edited by Nexus; 11-17-2010 at 05:20 PM.
Nexus is offline   Reply With Quote
Old 11-18-2010, 10:37 AM   #5
Nexus
Member
Nexus began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Nov 2010
Location: France
Device: PRS-600
Ok so here's a nicer recipe for THN:

Spoiler:
Code:
class AdvancedUserRecipe1289990851(BasicNewsRecipe):
    title          = u'THE HOCKEY NEWS'
    oldest_article = 7
    max_articles_per_feed = 25
    no_stylesheets = True
    remove_tags = [dict(name='div', attrs={'class':'article_info'}),
                          dict(name='div', attrs={'class':'photo_details'}),
                          dict(name='div', attrs={'class':'tool_menu'}),
	                  dict(name='div', attrs={'id':'comments_container'}),
                          dict(name='div', attrs={'id':'wrapper'})]
    keep_only_tags = [dict(name='h1', attrs={'class':['headline']}),
		                dict(name='div', attrs={'class':['box_container']})]
	
    feeds          = [(u'THN', u'http://www.thehockeynews.com/rss/all_categories.xml')]



And I got the RDS one too.

Spoiler:
Code:
class AdvancedUserRecipe1290013720(BasicNewsRecipe):
    title          = u'RDS'
    oldest_article = 7
    max_articles_per_feed = 25
    no_stylesheets = True
    remove_tags = [dict(name='div', attrs={'id':'rdsWrap'}),
		            dict(name='table', attrs={'id':'aVoir'}),
		            dict(name='div', attrs={'id':'imageChronique'})]
    keep_only_tags = [dict(name='div', attrs={'id':['enteteChronique']}),
		                dict(name='div', attrs={'id':['contenuChronique']})]
		       

    feeds          = [(u'RDS', u'http://www.rds.ca/hockey/fildepresse_rds.xml')]


TSN remains a mystery...
Nexus is offline   Reply With Quote
Advert
Old 11-19-2010, 09:56 AM   #6
Nexus
Member
Nexus began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Nov 2010
Location: France
Device: PRS-600
This TSN stuff is going to drive me nuts, lol. Can't figure out how to build the recipe. I should have said in my first post that I have none knowledge at all of python. I can identify HTML balises and I "guess" to what they link on a web page, but that's pretty much all.

Starting from this page (http://tsn.ca/nhl/story/?id=nhl), I understand I have to use the parse_index command in my recipe, but I don't know what to do with that. Python is just too much for me. If someone is kind enough to give me a hint, that be greatly appreciated. I'm not even asking for the full recipe, I'd like to understand the process, but after reading and reading tutorials and guides, I just can't figure out from where to start. That's beyond my comprehension.

Thanks.
Nexus is offline   Reply With Quote
Old 11-19-2010, 10:21 AM   #7
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Nexus View Post
Starting from this page (http://tsn.ca/nhl/story/?id=nhl), I understand I have to use the parse_index command in my recipe,
Correct.

Quote:
someone is kind enough to give me a hint
Look at some samples. These all use parse_index:
Code:
DrawAndCook.recipe' :
akter.recipe' :
atlantic.recipe' :
auto_prove.recipe' :
axxon_magazine.recipe' :
billorielly.recipe' :
borba.recipe' :
brand_eins.recipe' :
businessworldin.recipe' :
bwmagazine.recipe' :
calgary_herald.recipe' :
comics_com.recipe' :
cynewslive.recipe' :
cyprus_weekly.recipe' :
dani.recipe' :
daum_net.recipe' :
deredactie.recipe' :
economist.recipe' :
economist_free.recipe' :
edmonton_journal.recipe' :
el_cultural.recipe' :
elpais_impreso.recipe' :
elpais_semanal.recipe' :
eluniversalimpresa.recipe' :
entrepeneur.recipe' :
financial_times_uk.recipe' :
fokkeensukke.recipe' :
foreignaffairs.recipe' :
fstream.recipe' :
glas_srpske.recipe' :
go_comics.recipe' :
guardian.recipe' :
haaretz_en.recipe' :
harpers_full.recipe' :
hbr.recipe' :
hbr_blogs.recipe' :
hindu.recipe' :
houston_chronicle.recipe' :
ieeespectrum.recipe' :
inc.recipe' :
india_today.recipe' :
instapaper.recipe' :
johm.recipe' :
joop.recipe' :
kellog_faculty.recipe' :
kidney.recipe' :
lamujerdemivida.recipe' :
laprensa_ni.recipe' :
lemonde_dip.recipe' :
lenta_ru.recipe' :
losservatoreromano_it.recipe' :
lrb_payed.recipe' :
macleans.recipe' :
malaysian_mirror.recipe' :
milenio.recipe' :
ming_pao.recipe' :
monitor.recipe' :
montreal_gazette.recipe' :
national_post.recipe' :
ncrnext.recipe' :
nejm.recipe' :
new_york_review_of_books.recipe' :
new_york_review_of_books_no_sub.recipe' :
newsweek.recipe' :
newsweek_polska.recipe' :
nin.recipe' :
nymag.recipe' :
Since I wrote it, and it's first on the list, let's look at the relevant parts of DrawandCook
Code:
    def parse_index(self):
        feeds = []
        for title, url in [
                            ("They Draw and Cook", "http://www.theydrawandcook.com/")
                            ]:
            articles = self.make_links(url)
            if articles:
                feeds.append((title, articles))
        print 'feeds are: ', feeds
        return feeds

    def make_links(self, url):
        soup = self.index_to_soup(url)
        title = ''
        date = ''
        current_articles = []
        soup = self.index_to_soup(url)
        recipes = soup.findAll('div', attrs={'class': 'date-outer'})
        for recipe in recipes:
            title = recipe.h3.a.string
            page_url = recipe.h3.a['href']
            current_articles.append({'title': title, 'url': page_url, 'description':'', 'date':date})
        return current_articles
The parse_index method needs to return a feed and a list of articles for that feed. The structure above is set up for multiple feeds, but only does a single one, and that's what you want to do, too (unless you want to build multiple feeds).
The hard part is the list of articles, and that's done in make_links. You need to find a title and a url for each article. The date and description can be left blank, or filled in, as you prefer.

You can find the url and title for each article on your page (http://tsn.ca/nhl/story/?id=nhl). Just modify the Feed title and url of your page in parse_feeds, then modify make_links so that the findAll finds all your links, and the for loop finds the title and page_url for each.

Simple.

Last edited by Starson17; 11-19-2010 at 10:39 AM.
Starson17 is offline   Reply With Quote
Old 11-19-2010, 12:19 PM   #8
Nexus
Member
Nexus began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Nov 2010
Location: France
Device: PRS-600
Thanks for the help Starson17. That's not bad will on my side, but python is mumbo jumbo to me.


I think this is the tricky part for me, I'm not sure what to do.

Code:
            
    def make_links(self, url):
        soup = self.index_to_soup(url)
        title = ''
        date = ''
        current_articles = []
        soup = self.index_to_soup(url)
        recipes = soup.findAll('div', attrs={'class': 'date-outer'})
I have difficulties to see what tag I have to use, and most of all where I grab it (http://tsn.ca/nhl/story/?id=nhl) or article page? <div id= tsnColWrap> and <div id = tsnMain> appear on both pages, and div class = feature> only on the "main page" (...story/?id=nhl)

Code:
for recipe in recipes:
            title = recipe.h3.a.string
            page_url = recipe.h3.a['href']
            current_articles.append({'title': title, 'url': page_url, 'description':'', 'date':date})
        return current_articles
And there goes my mental health...

I have to modify the "title" and "page_url" line right? But same as above, I'm not sure where to look at and what to put there. Tried different things, got error messages each time. By the way, I added "from calibre.ebooks.BeautifulSoup import BeautifulSoup" at the begining of the recipe, I think I have to call that in order to make it work.

I'm a lost cause...
Nexus is offline   Reply With Quote
Old 11-19-2010, 12:32 PM   #9
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Nexus View Post
Thanks for the help Starson17. That's not bad will on my side, but python is mumbo jumbo to me.


I think this is the tricky part for me, I'm not sure what to do.

Code:
            
    def make_links(self, url):
        soup = self.index_to_soup(url)
        title = ''
        date = ''
        current_articles = []
        soup = self.index_to_soup(url)
        recipes = soup.findAll('div', attrs={'class': 'date-outer'})
I have difficulties to see what tag I have to use, and most of all where I grab it (http://tsn.ca/nhl/story/?id=nhl) or article page? <div id= tsnColWrap> and <div id = tsnMain> appear on both pages, and div class = feature> only on the "main page" (...story/?id=nhl)

Code:
for recipe in recipes:
            title = recipe.h3.a.string
            page_url = recipe.h3.a['href']
            current_articles.append({'title': title, 'url': page_url, 'description':'', 'date':date})
        return current_articles
And there goes my mental health...

I have to modify the "title" and "page_url" line right? But same as above, I'm not sure where to look at and what to put there. Tried different things, got error messages each time. By the way, I added "from calibre.ebooks.BeautifulSoup import BeautifulSoup" at the begining of the recipe, I think I have to call that in order to make it work.

I'm a lost cause...
I'll walk you through it, (if no one else does it first), but I'm a bit busy now, so I'll give it to you in dribs/drabs.

Start with the parse_feeds. I looked at your page. I think I was wrong when I said you want one feed. I'd use one feed per day, then put the articles for that day under that feed. Let's do this. You put together as much of the recipe as you can, and post it. I'll look it over. You should have enough to do just the parse_feeds part. Post that, with the rest of your recipe. Then I'll help with the make_links. Post your best shot on that too.

You may want to install FireBug in FireFox if you haven't done it yet. Yes, you needed to import BeautifulSoup.
Starson17 is offline   Reply With Quote
Old 11-19-2010, 03:09 PM   #10
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Here's a start:
Spoiler:
Code:
    INDEX = 'http://tsn.ca/nhl/story/?id=nhl'    
    def parse_index(self):
        feeds = []
        soup = self.index_to_soup(self.INDEX)
        feed_parts = soup.findAll('div', attrs={'class': 'feature'})
        for feed_part  in feed_parts:
            feed_title = feed_part.h2.string
            print 'feed_title is: ', feed_title
            article_parts = feed_part.findAll('a')
            print 'article_parts is: ', article_parts
            for article_part in article_parts:
                {build the article list here}
                articles = {the article list}
            if articles:
                feeds.append((feed_title, articles))
        print 'feeds are: ', feeds
        return feeds

This will get the feed title and a soup ("feed_part") that has links and titles for all the articles for that feed.

Last edited by Starson17; 11-19-2010 at 03:21 PM.
Starson17 is offline   Reply With Quote
Old 11-19-2010, 04:24 PM   #11
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Starson17 View Post
Here's a start:
I had a few minutes to finish parse_index:

Code:
    INDEX = 'http://tsn.ca/nhl/story/?id=nhl'    

    def parse_index(self):
        feeds = []
        soup = self.index_to_soup(self.INDEX)
        feed_parts = soup.findAll('div', attrs={'class': 'feature'})
        for feed_part  in feed_parts:
            articles = []
            if not feed_part.h2:
                continue
            feed_title = feed_part.h2.string
            article_parts = feed_part.findAll('a')
            for article_part in article_parts:
                article_title = article_part.string
                article_date = ''
                article_url = 'http://tsn.ca/' + article_part['href']
                articles.append({'title': article_title, 'url': article_url, 'description':'', 'date':article_date})
            if articles:
                feeds.append((feed_title, articles))
        return feeds
All you need to do now is remove the junk.
Starson17 is offline   Reply With Quote
Old 11-20-2010, 07:09 AM   #12
Nexus
Member
Nexus began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Nov 2010
Location: France
Device: PRS-600
Wow. Not a chance I could came up with something like that. Thanks a lot Starson. I removed the junk and it works just fine. Thanks a lot.

Spoiler:
Code:
from calibre.ebooks.BeautifulSoup import BeautifulSoup

class AdvancedUserRecipe1289990851(BasicNewsRecipe):
    title          = u'TSN'
    oldest_article = 7
    max_articles_per_feed = 50
    no_stylesheets = True
    INDEX = 'http://tsn.ca/nhl/story/?id=nhl'    
    keep_only_tags = [dict(name='div', attrs={'id':['tsnColWrap']}),
                             dict(name='div', attrs={'id':['tsnStory']})]
    remove_tags = [dict(name='div', attrs={'id':'tsnRelated'}),
                          dict(name='div', attrs={'class':'textSize'})]

    def parse_index(self):
        feeds = []
        soup = self.index_to_soup(self.INDEX)
        feed_parts = soup.findAll('div', attrs={'class': 'feature'})
        for feed_part  in feed_parts:
            articles = []
            if not feed_part.h2:
                continue
            feed_title = feed_part.h2.string
            article_parts = feed_part.findAll('a')
            for article_part in article_parts:
                article_title = article_part.string
                article_date = ''
                article_url = 'http://tsn.ca/' + article_part['href']
                articles.append({'title': article_title, 'url': article_url, 'description':'', 'date':article_date})
            if articles:
                feeds.append((feed_title, articles))
        return feeds
Nexus is offline   Reply With Quote
Old 11-22-2010, 11:53 AM   #13
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Nexus View Post
Wow. Not a chance I could came up with something like that. Thanks a lot Starson.
You're welcome. Thank you for posting the finished recipe.
Starson17 is offline   Reply With Quote
Old 11-23-2010, 08:47 AM   #14
Nexus
Member
Nexus began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Nov 2010
Location: France
Device: PRS-600
The least I can do...
Nexus is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Where my recipes are kept? bthoven Calibre 6 02-26-2010 12:20 AM
Best Newspaper Recipes geneaber Calibre 1 11-28-2009 11:10 AM
NY Times Recipes geneaber Calibre 0 11-08-2009 10:16 PM
how to remove recipes reup Calibre 2 08-31-2009 10:26 AM
Help with RSS recipes fmma Calibre 1 06-15-2009 11:51 AM


All times are GMT -4. The time now is 12:18 AM.


MobileRead.com is a privately owned, operated and funded community.