Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 11-20-2009, 08:51 AM   #871
dhiru
Connoisseur
dhiru began at the beginning.
 
Posts: 83
Karma: 10
Join Date: Aug 2009
Device: iphone, Irex iliad, sony prs950, kindle Dx, Ipad
hi kiklop74
could you please make recipe from moneycontrol.com rss feed-
http://www.moneycontrol.com/rss/latestnews.xml
http://www.moneycontrol.com/rss/allstories.xml

thanks
dhiru is offline  
Old 11-20-2009, 09:44 AM   #872
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Quote:
Originally Posted by dhiru View Post
hi kiklop74
could you please make recipe from moneycontrol.com rss feed-
http://www.moneycontrol.com/rss/latestnews.xml
http://www.moneycontrol.com/rss/allstories.xml

thanks
This site is complicated. Just don't have time to fight with badly formed html.
kiklop74 is offline  
Old 11-20-2009, 10:09 PM   #873
dhiru
Connoisseur
dhiru began at the beginning.
 
Posts: 83
Karma: 10
Join Date: Aug 2009
Device: iphone, Irex iliad, sony prs950, kindle Dx, Ipad
Quote:
Originally Posted by kiklop74 View Post
This site is complicated. Just don't have time to fight with badly formed html.
ok , whenever u have time kindly help. thanks
dhiru is offline  
Old 11-21-2009, 07:00 AM   #874
evanmaastrigt
Connoisseur
evanmaastrigt doesn't litterevanmaastrigt doesn't litter
 
Posts: 78
Karma: 192
Join Date: Nov 2009
Device: Sony PRS-600
Fokke en Sukke v2

Kovid was so kind to add the recipe for the 'Fokke en Sukke' cartoons to the latest version of Calibre. Unfortunately , something went wrong in the conversion from tabs to spaces, breaking the recipe (my bad really, should not have used tabs in the first place).

Here is the corrected version

fokkeensukke.zip
evanmaastrigt is offline  
Old 11-21-2009, 07:19 AM   #875
JayCeeEll
Connoisseur
JayCeeEll doesn't litterJayCeeEll doesn't litterJayCeeEll doesn't litter
 
JayCeeEll's Avatar
 
Posts: 87
Karma: 204
Join Date: Dec 2007
Location: Exeter, Devon, UK
Device: PRS-300
remove_tags not removing tags

I am working on some new recipes and I am having trouble with the remove_tags pre-processing routine.

The following script should just download the blog entry and comments, but I am also getting the sidebar contents, what am I doing wrong?

An example article is http://www.badscience.net/2009/11/oh-that-was-quick/

PHP Code:
__license__   'GPL v3'
__copyright__ '2009, JayCeeEll'

from calibre.web.feeds.news import BasicNewsRecipe

class BadScience(BasicNewsRecipe):
    
title                 u'Bad Science'
    
language              'en'
    
__author__            'JayCeeEll'
    
description           'Bad science in the media'
    
author                'Ben Goldacre'
    
publisher             'Ben Goldacre'
    
category              'blog, skepticism'
    
oldest_article        7
    max_articles_per_feed 
100
    no_stylesheets        
True
    encoding              
'utf8'
    
remove_javascript     True
    use_embedded_content  
False

    keep_only_tags 
= [dict(name='div'attrs={'class':'padded'})]
    
    
remove_tags = [
                   
dict(name='p'attrs={'class':'meta'})
                  ,
dict(name='div'attrs={'id':'respond'})
                  ,
dict(name='div'attrs={'id':'sidebar_right'})
                  ]

    
feeds = [(u'Bad Science'        u'http://www.badscience.net/feed/'      )] 
JayCeeEll is offline  
Old 11-21-2009, 08:00 AM   #876
evanmaastrigt
Connoisseur
evanmaastrigt doesn't litterevanmaastrigt doesn't litter
 
Posts: 78
Karma: 192
Join Date: Nov 2009
Device: Sony PRS-600
The div with id= sidebar_right, which you want to remove, contains a div with a class= padded, which you want to keep. I think this confuses Calibre a little.
evanmaastrigt is offline  
Old 11-22-2009, 04:52 AM   #877
tranqui69
Junior Member
tranqui69 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Nov 2009
Device: Sony PRS-505
First of all: You're awesome!!!.

Could you please make recipe from this two spanish newspapers rss feed?

http://www.levante-emv.com/
http://www.publico.es/

They have rss but i can't do it.

Thank You so much!!
tranqui69 is offline  
Old 11-22-2009, 08:01 AM   #878
evanmaastrigt
Connoisseur
evanmaastrigt doesn't litterevanmaastrigt doesn't litter
 
Posts: 78
Karma: 192
Join Date: Nov 2009
Device: Sony PRS-600
parse_index() question

I am working on a recipe consisting of a couple of RSS feeds and one webpage that needs custom parsing. Articles from both sources have the same structure, so they all can be parsed with the same preprocess_html()
So I thought to be clever and did something along this pseudo-code
Code:
class MyRecipe(BasicNewsRecipe) :
    INDEX = u'http://example.com'
    feeds = [(u'example', u'http://example.com/rss')]
    
    def parse_index(self) :
        #raise Exception('spam', 'eggs') # This is always raised
        answer = super(MyRecipe, self).parse_index()
        #raise Exception('spam', 'bacon and eggs') # This is never raised, but the feeds _are_  parsed
        
        #  Do my thing with self.INDEX . . .
        
        answer.insert(0, [myTitle, myArticles])
        
        return answer
But this does not work. The call to super.parse_index() never returns, where I expected it to have the same signature.

What am I missing, and is there a workaround?
evanmaastrigt is offline  
Old 11-22-2009, 09:20 AM   #879
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,385
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
IIRC, parse_index in the base class is not implemented at all. It will just raise an exception.
kovidgoyal is offline  
Old 11-22-2009, 02:57 PM   #880
evanmaastrigt
Connoisseur
evanmaastrigt doesn't litterevanmaastrigt doesn't litter
 
Posts: 78
Karma: 192
Join Date: Nov 2009
Device: Sony PRS-600
farodevigo.es

Quote:
Originally Posted by fortunados View Post
There are links like this one "http://www.farodevigo.es//elementosInt/rss/2" that I can open in firefox and read them as RSS.

snip...

To the point...
I can open and see rss with firefox, but there is no way to do it with calibre, it says failed feed and anything else.
Here is what I did: I opened the first link in this RSS feed in my browser and was presented with a 'nice' flash movie. You can click this away or sit it out and only after that you can read the article. Any subsequent link from the same feed opens without that flash movie.
Next I destroyed all my cookies and clicked 'reload'. There was the flash movie again.

Nice...

Now, as far as I understand Calibre uses only one instance of a browser to parse pages; and that browser supports cookies. So a possible workaround is to parse the feeds by hand, open the first article manually, ignore the result and let Calibre proceed. As cookies are now set, it should work. Or maybe not, I don't know.

Maybe Kovid can tell if this is feasible.
evanmaastrigt is offline  
Old 11-22-2009, 03:03 PM   #881
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,385
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You can turn off/on cookies and overload get_article_url to avoid the flash movie.
kovidgoyal is offline  
Old 11-22-2009, 05:14 PM   #882
evanmaastrigt
Connoisseur
evanmaastrigt doesn't litterevanmaastrigt doesn't litter
 
Posts: 78
Karma: 192
Join Date: Nov 2009
Device: Sony PRS-600
Quote:
Originally Posted by kovidgoyal View Post
IIRC, parse_index in the base class is not implemented at all. It will just raise an exception.
Weird...

If I can reproduce the behavior I observed, should I open a ticket? Because I think it is a nice-to-have feature. It will open the whole can of worms of backwards-compatibility, but hey is that my problem ;-)
evanmaastrigt is offline  
Old 11-22-2009, 10:27 PM   #883
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,385
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The way to do this is to override get_article_url

in get_article_url you fetch the actual page using index_to_soup, check if the flash movie is on it if so, return the url of the actual page
kovidgoyal is offline  
Old 11-23-2009, 04:36 AM   #884
Spankypoo
Enthusiast
Spankypoo ought to be getting tired of karma fortunes by now.Spankypoo ought to be getting tired of karma fortunes by now.Spankypoo ought to be getting tired of karma fortunes by now.Spankypoo ought to be getting tired of karma fortunes by now.Spankypoo ought to be getting tired of karma fortunes by now.Spankypoo ought to be getting tired of karma fortunes by now.Spankypoo ought to be getting tired of karma fortunes by now.Spankypoo ought to be getting tired of karma fortunes by now.Spankypoo ought to be getting tired of karma fortunes by now.Spankypoo ought to be getting tired of karma fortunes by now.Spankypoo ought to be getting tired of karma fortunes by now.
 
Posts: 29
Karma: 499348
Join Date: Jun 2009
Device: Myriad
Anyone know of a way to select articles for inclusion/exclusion based on their title?

E.g., I'd like to only pull articles containing the phrase "Calibre r0x0rz" from an RSS feed, and have it exclude the others.

Thanks!
Spankypoo is offline  
Old 11-23-2009, 04:50 AM   #885
fortunados
Junior Member
fortunados began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Oct 2009
Device: PRS-505
Well I cannot see any flash in the articles I have readed, actually this is not the problem at all if you check the other articles.

I am just tryinf to get somthing in calibre but I cannot get anything with the address

http://www.farodevigo.es//elementosInt/rss/2

But I can see the rss page and the code and so when I open it in firefox, I dont know if it is related to flash.

I have no clue but I amagine that there is something in the server that checks the browser or something with Java and send or not the page, but this I am just guessing.

If anyone could cook a recipe of just give me any hints I would apprecciate.

Regards.

Quote:
Originally Posted by evanmaastrigt View Post
Here is what I did: I opened the first link in this RSS feed in my browser and was presented with a 'nice' flash movie. You can click this away or sit it out and only after that you can read the article. Any subsequent link from the same feed opens without that flash movie.
Next I destroyed all my cookies and clicked 'reload'. There was the flash movie again.

Nice...

Now, as far as I understand Calibre uses only one instance of a browser to parse pages; and that browser supports cookies. So a possible workaround is to parse the feeds by hand, open the first article manually, ignore the result and let Calibre proceed. As cookies are now set, it should work. Or maybe not, I don't know.

Maybe Kovid can tell if this is feasible.
fortunados is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 12:42 PM.


MobileRead.com is a privately owned, operated and funded community.