Custom recipes (archive, read-only) - Page 18

Emm3t · 02-20-2009, 01:16 PM

Thanks Kovid,

I downloaded the new .py version of the feed ('cos I don't do well with cut'n'paste) and it all works well.

Many thanks for this and have a great weekend,

Emmet

kovidgoyal · 02-20-2009, 02:30 PM

Quote:

Originally Posted by Sydney's Mom

Any luck with Chicago Tribune? Thanks, Debra

The next release of calibre will have a recipe for the Chicago Tribune.

crAss · 02-21-2009, 04:54 AM

Could you add a recipe for the English rss feed of Al Jazeera?
The address is
http://english.aljazeera.net/Service...31105943979989

It is the only RSS feed I have been unable to create a recipe from on my own. When I add it it just downloads and creates the first page, but no articles.
Thank you in advance!

kiklop74 · 02-21-2009, 08:16 AM

I'm afraid they have some protection system that detects scraping and after one or two downloads that work ok server starts to reject requests.

You could try the recipe from some other IP address and placing this in your code:

Code:

    simultaneous_downloads = 1
    delay                          = 4

XanthanGum · 02-21-2009, 10:07 AM

Quote:

Originally Posted by kiklop74

Aparently people from Harper's Magazine decided to completely remove text version of their printed edition articles leaving only PDF and image version. That change is applied as of March 2009 edition. This means that recipe for printed edition will stop working.

I will see if there is any chance of manipulating pdf format, but since I know how tough format that is I do not expect much. However the recipe might be modified in such way to at least enable download of older issues.

Is there interrest for such thing?

kiklop74,

I would like to see such a recipe. Thanks.

XG

XanthanGum · 02-21-2009, 10:18 AM

Quote:

Originally Posted by XanthanGum

kiklop74,

I would like to see such a recipe. Thanks.

XG

kiklop74,

Please ignore my earlier Harper's request. I download and read your other recipe, the one that doesn't require a login. I get a sufficient number of articles from that recipe.

XG

XanthanGum · 02-21-2009, 10:40 AM

Quote:

Originally Posted by kiklop74

I'm afraid they have some protection system that detects scraping and after one or two downloads that work ok server starts to reject requests.

You could try the recipe from some other IP address and placing this in your code:

Code:

    simultaneous_downloads = 1
    delay                          = 4

kiklop74,

I have a similar problem with Aljazeera English.

I, too, would like to have a recipe for this service. They provide good world news coverage.

Thanks if possible...

XG

kiklop74 · 02-21-2009, 10:42 AM

This behaviour is by design. When you specify --test it means "download only two articles from feed". To download everything do not use --test option. Science news and Spiegel work correctly.

kiklop74 · 02-21-2009, 10:57 AM

Quote:

Originally Posted by luqmaninbmore

I would like to have recipes created for the following journals/magazines:

New Left Review
www.newleftreview.org

Hidden City Quarterly
www.hcquarterly.com

Radical Philosophy
www.radicalphilosophy.com

The Ghazal Page
http://www.ghazalpage.net/

New left review - articles are in pdf. Making this recipe is too time consuming for me.

Hidden City quarterly - due to complicated layout of the site this is also complicated recipe (though this one I could actually do)

Radical Philosophy - this one is doable - will be done in the next 10-15 days when I catch time

The Ghazal Page - this one is also doable - will be done in the next 10-15 days when I catch time

kiklop74 · 02-21-2009, 10:59 AM

New recipe for Serbian news portal E-novine:

kiklop74 · 02-21-2009, 01:57 PM

Al Jazeera in english ()

Code:

#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = '2009, Darko Miletic <darko.miletic at gmail.com>'

'''
aljazeera.net
'''

class AlJazeera(BasicNewsRecipe):
    title                  = 'Al Jazeera in English'
    __author__             = 'Darko Miletic'
    description            = 'News from Middle East'
    publisher              = 'Al Jazeera'
    category               = 'news, politics, middle east'
    simultaneous_downloads = 1
    delay                  = 4    
    oldest_article         = 1
    max_articles_per_feed  = 100
    no_stylesheets         = True
    encoding               = 'iso-8859-1'
    remove_javascript      = True
    use_embedded_content   = False
    
    html2lrf_options = [
                          '--comment', description
                        , '--category', category
                        , '--publisher', publisher
                        , '--ignore-tables'
                        ]
    
    html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"\nlinearize_table=True' 
     
    keep_only_tags = [dict(name='div', attrs={'id':'ctl00_divContent'})]

    remove_tags = [
                     dict(name=['object','link'])
                    ,dict(name='td', attrs={'class':['MostActiveDescHeader','MostActiveDescBody']})
                  ]

    feeds = [(u'AL JAZEERA ENGLISH (AJE)', u'http://english.aljazeera.net/Services/Rss/?PostingId=2007731105943979989' )]

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        for item in soup.findAll(face=True):
            del item['face']
        return soup
+

Hypernova · 02-21-2009, 09:01 PM

Quote:

Originally Posted by kovidgoyal

It should be doable by using the postprocess_html method, which allows you to perform arbitrary manipulations on the downloaded html just before it is saved.

So what you will need to do is for each such image figure out the corresponding text and add it ina <p> after the image.

The postproces_html method is passed two parameters a BeautifulSoup instance and a boolean indicating if the HTML is the first page of the article or not. You can use the soup parameter to perform the manipulations. See the documentation of the BeautifulSoup package to understand how to use it.

Thank you for you help, but I think I'll pass on that. I know it's not that hard, but I don't think I should spend that much time on the recipe and start reading instead

Anyway, here's the recipe for Paul Thurrott's SuperSite for Windows

howsey · 02-22-2009, 01:48 AM

Only just got a Sony Reader and started using the Calibre software. The idea of being able to convert RSS feeds to an ebook is really appealing. I've attempted to create a custom news source for 'The Register' (http://www.theregister.co.uk/headlines.atom). The feed downloads OK and a book is produced but it only contains the feeds and not any content from the associated web page. My initial thought was that Calibre does not handle Atom feeds but the website does mention support for Atom. Any suggestions?

The code is as follows:

class AdvancedUserRecipe1235238489(BasicNewsRecipe):
title = u'The Register'
oldest_article = 7
max_articles_per_feed = 100
use_embedded_content = False

feeds = [(u'The Register', u'http://www.theregister.co.uk/headlines.atom')]

Note. I added use_embedded_content = False and the file size did increase, so I assume some extra content was included but the first few pages that I checked still only contained the Feed information.

ilovejedd · 02-22-2009, 02:33 AM

Try adding:

Code:

def print_version(self, url):
	return url + 'print.html'

howsey · 02-22-2009, 05:25 AM

Quote:

Originally Posted by ilovejedd

Try adding:

Code:

def print_version(self, url):
	return url + 'print.html'

Thanks for that. I've now got it working reasonably well. The next issue is that the article contains hyperlinks. The default processing seems to be to replace these with the element text and then include the url in brackets afterwards. Is there a way to stop the url coming out. My initial thought was to try the pre/post processing functions but this appears to filter out way too early.

02-21-2009, 08:16 AM	#259
kiklop74 Guru Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage	I'm afraid they have some protection system that detects scraping and after one or two downloads that work ok server starts to reject requests. You could try the recipe from some other IP address and placing this in your code: Code: simultaneous_downloads = 1 delay = 4

02-22-2009, 01:48 AM	#268
howsey Junior Member Posts: 6 Karma: 10 Join Date: Feb 2009 Device: Sony Reader	'The Register' recipe Only just got a Sony Reader and started using the Calibre software. The idea of being able to convert RSS feeds to an ebook is really appealing. I've attempted to create a custom news source for 'The Register' (http://www.theregister.co.uk/headlines.atom). The feed downloads OK and a book is produced but it only contains the feeds and not any content from the associated web page. My initial thought was that Calibre does not handle Atom feeds but the website does mention support for Atom. Any suggestions? The code is as follows: class AdvancedUserRecipe1235238489(BasicNewsRecipe): title = u'The Register' oldest_article = 7 max_articles_per_feed = 100 use_embedded_content = False feeds = [(u'The Register', u'http://www.theregister.co.uk/headlines.atom')] Note. I added use_embedded_content = False and the file size did increase, so I assume some extra content was included but the first few pages that I checked still only contained the Feed information.

02-22-2009, 02:33 AM	#269
ilovejedd hopeless n00b Posts: 5,110 Karma: 19597086 Join Date: Jan 2009 Location: in the middle of nowhere Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9	Try adding: Code: def print_version(self, url): return url + 'print.html'

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Custom column read ?	pchrist7	Calibre	2	10-04-2010 02:52 AM
Archive for custom screensavers	sleeplessdave	Amazon Kindle	1	07-07-2010 12:33 PM
How to back up preferences and custom recipes?	greenapple	Calibre	3	03-29-2010 05:08 AM
Donations for Custom Recipes	ddavtian	Calibre	5	01-23-2010 04:54 PM
Help understanding custom recipes	andersent	Calibre	0	12-17-2009 02:37 PM

02-20-2009, 01:16 PM	#256
Emm3t Junior Member Posts: 6 Karma: 10 Join Date: Feb 2009 Location: Spain Device: Sony PRS-505	Thanks Kovid, I downloaded the new .py version of the feed ('cos I don't do well with cut'n'paste) and it all works well. Many thanks for this and have a great weekend, Emmet

02-21-2009, 04:54 AM	#258
crAss Connoisseur Posts: 68 Karma: 20 Join Date: Jan 2009 Location: Athens, Greece Device: Cybook Gen3	Could you add a recipe for the English rss feed of Al Jazeera? The address is http://english.aljazeera.net/Service...31105943979989 It is the only RSS feed I have been unable to create a recipe from on my own. When I add it it just downloads and creates the first page, but no articles. Thank you in advance!

02-21-2009, 10:42 AM	#263
kiklop74 Guru Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage	This behaviour is by design. When you specify --test it means "download only two articles from feed". To download everything do not use --test option. Science news and Spiegel work correctly.

Advert

Advert