Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 02-20-2009, 01:16 PM   #256
Emm3t
Junior Member
Emm3t began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Feb 2009
Location: Spain
Device: Sony PRS-505
Thanks Kovid,

I downloaded the new .py version of the feed ('cos I don't do well with cut'n'paste) and it all works well.

Many thanks for this and have a great weekend,

Emmet
Emm3t is offline  
Old 02-20-2009, 02:30 PM   #257
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,398
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by Sydney's Mom View Post
Any luck with Chicago Tribune? Thanks, Debra
The next release of calibre will have a recipe for the Chicago Tribune.
kovidgoyal is offline  
Advert
Old 02-21-2009, 04:54 AM   #258
crAss
Connoisseur
crAss began at the beginning.
 
Posts: 68
Karma: 20
Join Date: Jan 2009
Location: Athens, Greece
Device: Cybook Gen3
Could you add a recipe for the English rss feed of Al Jazeera?
The address is
http://english.aljazeera.net/Service...31105943979989

It is the only RSS feed I have been unable to create a recipe from on my own. When I add it it just downloads and creates the first page, but no articles.
Thank you in advance!
crAss is offline  
Old 02-21-2009, 08:16 AM   #259
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
I'm afraid they have some protection system that detects scraping and after one or two downloads that work ok server starts to reject requests.

You could try the recipe from some other IP address and placing this in your code:

Code:
    simultaneous_downloads = 1
    delay                          = 4
kiklop74 is offline  
Old 02-21-2009, 10:07 AM   #260
XanthanGum
Connoisseur
XanthanGum began at the beginning.
 
XanthanGum's Avatar
 
Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
Harper Magazine

Quote:
Originally Posted by kiklop74 View Post
Aparently people from Harper's Magazine decided to completely remove text version of their printed edition articles leaving only PDF and image version. That change is applied as of March 2009 edition. This means that recipe for printed edition will stop working.

I will see if there is any chance of manipulating pdf format, but since I know how tough format that is I do not expect much. However the recipe might be modified in such way to at least enable download of older issues.

Is there interrest for such thing?
kiklop74,

I would like to see such a recipe. Thanks.

XG
XanthanGum is offline  
Advert
Old 02-21-2009, 10:18 AM   #261
XanthanGum
Connoisseur
XanthanGum began at the beginning.
 
XanthanGum's Avatar
 
Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
Cancel My Request for Harper's

Quote:
Originally Posted by XanthanGum View Post
kiklop74,

I would like to see such a recipe. Thanks.

XG
kiklop74,

Please ignore my earlier Harper's request. I download and read your other recipe, the one that doesn't require a login. I get a sufficient number of articles from that recipe.

XG
XanthanGum is offline  
Old 02-21-2009, 10:40 AM   #262
XanthanGum
Connoisseur
XanthanGum began at the beginning.
 
XanthanGum's Avatar
 
Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
Quote:
Originally Posted by kiklop74 View Post
I'm afraid they have some protection system that detects scraping and after one or two downloads that work ok server starts to reject requests.

You could try the recipe from some other IP address and placing this in your code:

Code:
    simultaneous_downloads = 1
    delay                          = 4
kiklop74,

I have a similar problem with Aljazeera English.

I, too, would like to have a recipe for this service. They provide good world news coverage.

Thanks if possible...

XG

Last edited by XanthanGum; 02-21-2009 at 11:00 AM. Reason: Confirm that kiklop74 is right
XanthanGum is offline  
Old 02-21-2009, 10:42 AM   #263
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
This behaviour is by design. When you specify --test it means "download only two articles from feed". To download everything do not use --test option. Science news and Spiegel work correctly.
kiklop74 is offline  
Old 02-21-2009, 10:57 AM   #264
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Quote:
Originally Posted by luqmaninbmore View Post
I would like to have recipes created for the following journals/magazines:

New Left Review
www.newleftreview.org

Hidden City Quarterly
www.hcquarterly.com

Radical Philosophy
www.radicalphilosophy.com

The Ghazal Page
http://www.ghazalpage.net/

New left review - articles are in pdf. Making this recipe is too time consuming for me.

Hidden City quarterly - due to complicated layout of the site this is also complicated recipe (though this one I could actually do)

Radical Philosophy - this one is doable - will be done in the next 10-15 days when I catch time

The Ghazal Page - this one is also doable - will be done in the next 10-15 days when I catch time
kiklop74 is offline  
Old 02-21-2009, 10:59 AM   #265
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
New recipe for Serbian news portal E-novine:
Attached Files
File Type: zip e-novine.zip (1.4 KB, 325 views)
kiklop74 is offline  
Old 02-21-2009, 01:57 PM   #266
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Al Jazeera in english ()

Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = '2009, Darko Miletic <darko.miletic at gmail.com>'

'''
aljazeera.net
'''

class AlJazeera(BasicNewsRecipe):
    title                  = 'Al Jazeera in English'
    __author__             = 'Darko Miletic'
    description            = 'News from Middle East'
    publisher              = 'Al Jazeera'
    category               = 'news, politics, middle east'
    simultaneous_downloads = 1
    delay                  = 4    
    oldest_article         = 1
    max_articles_per_feed  = 100
    no_stylesheets         = True
    encoding               = 'iso-8859-1'
    remove_javascript      = True
    use_embedded_content   = False
    
    html2lrf_options = [
                          '--comment', description
                        , '--category', category
                        , '--publisher', publisher
                        , '--ignore-tables'
                        ]
    
    html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"\nlinearize_table=True' 
     
    keep_only_tags = [dict(name='div', attrs={'id':'ctl00_divContent'})]

    remove_tags = [
                     dict(name=['object','link'])
                    ,dict(name='td', attrs={'class':['MostActiveDescHeader','MostActiveDescBody']})
                  ]

    feeds = [(u'AL JAZEERA ENGLISH (AJE)', u'http://english.aljazeera.net/Services/Rss/?PostingId=2007731105943979989' )]

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        for item in soup.findAll(face=True):
            del item['face']
        return soup
+
kiklop74 is offline  
Old 02-21-2009, 09:01 PM   #267
Hypernova
Hyperreader
Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.
 
Posts: 130
Karma: 28678
Join Date: Feb 2009
Device: Current: Boox Leaf2 (broken) Past: H2O, Kindle PW1, DXG;Pocketbook 360
Quote:
Originally Posted by kovidgoyal View Post
It should be doable by using the postprocess_html method, which allows you to perform arbitrary manipulations on the downloaded html just before it is saved.

So what you will need to do is for each such image figure out the corresponding text and add it ina <p> after the image.

The postproces_html method is passed two parameters a BeautifulSoup instance and a boolean indicating if the HTML is the first page of the article or not. You can use the soup parameter to perform the manipulations. See the documentation of the BeautifulSoup package to understand how to use it.
Thank you for you help, but I think I'll pass on that. I know it's not that hard, but I don't think I should spend that much time on the recipe and start reading instead

Anyway, here's the recipe for Paul Thurrott's SuperSite for Windows
Attached Files
File Type: zip Winsupersite.zip (595 Bytes, 329 views)
Hypernova is offline  
Old 02-22-2009, 01:48 AM   #268
howsey
Junior Member
howsey began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Feb 2009
Device: Sony Reader
'The Register' recipe

Only just got a Sony Reader and started using the Calibre software. The idea of being able to convert RSS feeds to an ebook is really appealing. I've attempted to create a custom news source for 'The Register' (http://www.theregister.co.uk/headlines.atom). The feed downloads OK and a book is produced but it only contains the feeds and not any content from the associated web page. My initial thought was that Calibre does not handle Atom feeds but the website does mention support for Atom. Any suggestions?

The code is as follows:

class AdvancedUserRecipe1235238489(BasicNewsRecipe):
title = u'The Register'
oldest_article = 7
max_articles_per_feed = 100
use_embedded_content = False

feeds = [(u'The Register', u'http://www.theregister.co.uk/headlines.atom')]

Note. I added use_embedded_content = False and the file size did increase, so I assume some extra content was included but the first few pages that I checked still only contained the Feed information.
howsey is offline  
Old 02-22-2009, 02:33 AM   #269
ilovejedd
hopeless n00b
ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.
 
ilovejedd's Avatar
 
Posts: 5,110
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
Try adding:
Code:
def print_version(self, url):
	return url + 'print.html'
ilovejedd is offline  
Old 02-22-2009, 05:25 AM   #270
howsey
Junior Member
howsey began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Feb 2009
Device: Sony Reader
Remove <a> tags in body of article but keep element text

Quote:
Originally Posted by ilovejedd View Post
Try adding:
Code:
def print_version(self, url):
	return url + 'print.html'
Thanks for that. I've now got it working reasonably well. The next issue is that the article contains hyperlinks. The default processing seems to be to replace these with the element text and then include the url in brackets afterwards. Is there a way to stop the url coming out. My initial thought was to try the pre/post processing functions but this appears to filter out way too early.
howsey is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 10:42 AM.


MobileRead.com is a privately owned, operated and funded community.