Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 05-01-2010, 11:51 AM   #1876
swmkdr
Junior Member
swmkdr began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Apr 2010
Device: Kindle 3
Okay, so I tried to make an AVClub website recipe by customising the bbc one. It seems to work fine, but I need some help removing all the extra stuff - headers, sidebar, images etc. This is what the recipe looks like at the moment:

Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
'''
bbc.co.uk
'''

from calibre.web.feeds.news import BasicNewsRecipe

class BBC(BasicNewsRecipe):
    title          = u'The Onion AV Club'
    __author__     = 'Kovid Goyal'
    description    = 'Film, Television and Music Reviews'
    oldest_article        = 2    
    max_articles_per_feed = 100   
    no_stylesheets = True
    use_embedded_content  = False
    encoding              = 'utf-8'
    remove_javascript     = True

    remove_tags    = [dict(name='div', attrs={'class':'footer'})]
    extra_css      = '.headline {font-size: x-large;} \n .fact { padding-top: 10pt  }' 

    feeds          = [
                      ('Interviews', 'http://www.avclub.com/feed/interview/'), 
                      ('Features', 'http://www.avclub.com/feed/features/'),
                      ('Film', 'http://www.avclub.com/feed/film/'),
                      ('Music', 'http://www.avclub.com/feed/music/'),
                      ('DVD', 'http://www.avclub.com/feed/dvd/'),
                      ('Books', 'http://www.avclub.com/feed/books/'),
                      ('Games', 'http://www.avclub.com/feed/games/'),
                      ('AV Club Daily', 'http://www.avclub.com/feed/daily/'),
                    ]
Any help would be appreciated. Thanks in advance.
swmkdr is offline  
Old 05-01-2010, 04:16 PM   #1877
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by swmkdr View Post
Okay, so I tried to make an AVClub website recipe by customising the bbc one. It seems to work fine, but I need some help removing all the extra stuff - headers, sidebar, images etc. This is what the recipe looks like at the moment:
...

Any help would be appreciated. Thanks in advance.
Change your remove_tags to this:
Code:
    keep_only_tags     = [dict(name='div', attrs={'id':'content'})
                          ]

    remove_tags    = [dict(name='div', attrs={'class':['footer','tools_horizontal']}),
                      dict(name='div', attrs={'id':['tool_holder','elsewhere_on_avclub']})
                      ]
This was only tested on one article, so you'll need to test the others.

As an aside, when someone has tried to make the recipe, and posts the recipe with feeds, etc., it makes it easier to help. Further, I'm more inclined to try to help if they've done as much as they can. In this case, as in many cases, all that was needed was to run Firefox on the article, then use Firebug to identify the class, div, id, etc. for elements that should be kept or removed.
Starson17 is offline  
Old 05-01-2010, 04:29 PM   #1878
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by lmittell View Post
I'd love to cook up my own recipe (sorry about the pun), but there's no coherent explanation of how to do so that a mere graduate engineer can get a grip on. Of course, I may not be looking in the right place.
Read this and this and this and this.
Starson17 is offline  
Old 05-01-2010, 04:38 PM   #1879
swmkdr
Junior Member
swmkdr began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Apr 2010
Device: Kindle 3
Thanks a lot for the help - that worked perfectly. I'm a complete novice but I took at look at firebug and it seems easy enough for me to use in the future.

For others that want it, here's the Onion AV Club recipe:


onionavclub.zip
swmkdr is offline  
Old 05-01-2010, 05:31 PM   #1880
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by swmkdr View Post
I'm a complete novice but I took at look at firebug and it seems easy enough for me to use in the future.
Just right click on an element you want to remove (while viewing it in Firefox at the site) and select "Inspect Element." You can then figure out how to remove it from the popup that appears.
Starson17 is offline  
Old 05-02-2010, 03:31 AM   #1881
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
New Recipe for
Il Messaggero

Italian Daily News Paper
Attached Files
File Type: zip IlMessaggero.zip (942 Bytes, 292 views)
gambarini is offline  
Old 05-02-2010, 04:46 AM   #1882
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
New recipe for
www.adnkronos.it

An Italian News Agency
Attached Files
File Type: zip ADNKRONOS.zip (974 Bytes, 273 views)
gambarini is offline  
Old 05-03-2010, 02:02 AM   #1883
mobilewilier
Connoisseur
mobilewilier ought to be getting tired of karma fortunes by now.mobilewilier ought to be getting tired of karma fortunes by now.mobilewilier ought to be getting tired of karma fortunes by now.mobilewilier ought to be getting tired of karma fortunes by now.mobilewilier ought to be getting tired of karma fortunes by now.mobilewilier ought to be getting tired of karma fortunes by now.mobilewilier ought to be getting tired of karma fortunes by now.mobilewilier ought to be getting tired of karma fortunes by now.mobilewilier ought to be getting tired of karma fortunes by now.mobilewilier ought to be getting tired of karma fortunes by now.mobilewilier ought to be getting tired of karma fortunes by now.
 
Posts: 53
Karma: 496648
Join Date: May 2010
Device: Sony PRS-600
Dear Forumites

Can anyone please help me create a custom recipe for an English language Hong Kong newspaper. It does need a username and password.

http://www.scmp.com/portal/site/SCMP...ervices&ss=RSS

If you could please get me started with a template I would be absolutely indebted to you....
mobilewilier is offline  
Old 05-03-2010, 03:11 PM   #1884
Tumaini
Junior Member
Tumaini began at the beginning.
 
Posts: 8
Karma: 10
Join Date: May 2010
Device: Bebook One (Hanlin v3)
Swedish news

EDIT: Double post mistake

Last edited by Tumaini; 05-04-2010 at 08:25 PM.
Tumaini is offline  
Old 05-03-2010, 03:21 PM   #1885
Tumaini
Junior Member
Tumaini began at the beginning.
 
Posts: 8
Karma: 10
Join Date: May 2010
Device: Bebook One (Hanlin v3)
Here are recipes for two Swedish news networks:

Ekot (NOTE - Ekot changed their format so this script probably won't work):
Code:
class Ekot_SE(BasicNewsRecipe):
    title                 = 'Ekot'
    __author__            = 'Joakim Lindskog'
    description           = 'Nyheter från Ekot'
    publisher             = 'Ekot'
    category              = 'news, politics, Sweden'
    oldest_article        = 7
    delay                 = 1
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    encoding              = 'utf-8'
    language              = 'sv'

    conversion_options = {
                          'comment'   : description
                        , 'tags'      : category
                        , 'publisher' : publisher
                        , 'language'  : language
                        }

    keep_only_tags = [dict(name='h1', attrs={'class':'newsH2'}),
                               dict(name='div', attrs={'class':'articleTop'}),
                               dict(name='div', attrs={'class':'newsIntro'}),
                               dict(name='div', attrs={'class':'newsText'})]
    remove_tags = [
                     dict(name=['object','link','base'])
                    ,dict(name='span',attrs={'class':'relLink'})
                  ]

    feeds          = [(u'Ekot', u'http://api.sr.se/api/rssfeed/rssfeed.aspx?rssfeed=83'),
                          (u'Utrikes', u'http://api.sr.se/api/rssfeed/rssfeed.aspx?rssfeed=3304'),
                          (u'Radiosporten', u'http://api.sr.se/api/rssfeed/rssfeed.aspx?rssfeed=179')]

    def print_version(self, url):
        return url.replace('http://sverigesradio.se/cgi-bin/ekot/artikel.asp', 'http://sverigesradio.se/cgi-bin/isidorpub/PrinterFriendlyArticle.asp')+'&ProgramID=83'
Fria Tidningen (all categories, works great):
Code:
class FriaTidningen_SE(BasicNewsRecipe):
    title          = u'Fria Tidningen'
    __author__            = 'Joakim Lindskog'
    description           = 'Nyheter från Fria Tidningen'
    publisher             = 'Fria Tidningen'
    category              = 'news, politics, Sweden'
    oldest_article        = 7
    delay                 = 1
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    encoding              = 'utf-8'
    language              = 'sv'

    conversion_options = {
                          'comment'   : description
                        , 'tags'      : category
                        , 'publisher' : publisher
                        , 'language'  : language
                        }

    keep_only_tags = [dict(name='div', attrs={'id':'content-area'})]
    remove_tags_before = dict(name='div', attrs={'id':'content-area'})
    remove_tags_after = dict(name='div',attrs={'id':'byline'})
    remove_tags = [
                     dict(name=['object','link','base']),
                     dict(name='div', attrs={'id':'comments'}),
                     dict(name='div', attrs={'id':'block-block-21'}),
                     dict(name='div', attrs={'id':'block-block-22'}),
                     dict(name='div', attrs={'id':'block-block-23'}),
                     dict(name='div', attrs={'id':'block-block-24'}),
                     dict(name='div', attrs={'id':'block-block-25'}),
                     dict(name='div', attrs={'id':'block-block-26'}),
                     dict(name='div', attrs={'id':'block-block-27'}),
                     dict(name='div', attrs={'id':'block-block-28'}),
                     dict(name='div', attrs={'id':'block-block-29'}),
                     dict(name='div', attrs={'id':'block-block-30'}),
                     dict(name='div', attrs={'id':'block-block-40'})
                  ]

    feeds          = [(u'Allt', u'http://www.fria.nu/feed'),
                          (u'Nyheter', u'http://www.fria.nu/taxonomy/term/13/feed/feed'),
                          (u'Inrikes', u'http://www.fria.nu/taxonomy/term/14/0/feed'),
                          (u'Utrikes', u'http://www.fria.nu/taxonomy/term/15/0/feed'),
                          (u'Ekonomi', u'http://www.fria.nu/taxonomy/term/27047/0/feed'),
                          (u'Opinion', u'http://www.fria.nu/taxonomy/term/22/0/feed'),
                          (u'Inledaren', u'http://www.fria.nu/taxonomy/term/24/0/feed'),
                          (u'Argument', u'http://www.fria.nu/taxonomy/term/23/0/feed'),
                          (u'Synpunkten', u'http://www.fria.nu/taxonomy/term/26/0/feed'),
                          (u'Debatt', u'http://www.fria.nu/taxonomy/term/25/0/feed'),
                          (u'Kultur', u'http://www.fria.nu/taxonomy/term/19/0/feed'),
                          (u'Kulturnyheter', u'http://www.fria.nu/taxonomy/term/24534/0/feed'),
                          (u'Recensioner', u'http://www.fria.nu/taxonomy/term/24535/0/feed'),
                          (u'BAK', u'http://www.fria.nu/taxonomy/term/27/0/feed'),
                          (u'Sport & Hälsa' u'http://www.fria.nu/taxonomy/term/27215/0/feed'),
                          (u'Sport', u'http://www.fria.nu/taxonomy/term/20/0/feed'),
                          (u'Hälsa', u'http://www.fria.nu/taxonomy/term/21/0/feed'),
                          (u'Fördjupning', u'http://www.fria.nu/taxonomy/term/24994/0/feed'),
                          (u'Fokus', u'http://www.fria.nu/taxonomy/term/24864/0/feed'),
                          (u'Samtal', u'http://www.fria.nu/taxonomy/term/28/0/feed'),
                          (u'Stockholm', u'http://www.fria.nu/taxonomy/term/122/0/feed'),
                          (u'Göteborg', u'http://www.fria.nu/taxonomy/term/73/0/feed'),
                          (u'Uppsala', u'http://www.fria.nu/taxonomy/term/27324/0/feed'),
                          (u'Malmö', u'http://www.fria.nu/taxonomy/term/28031/0/feed')]
Many thanks to Darko Miletic from who's recipes I borrowed code and of course to Kovid Goyal!

Last edited by Tumaini; 05-05-2010 at 09:25 AM.
Tumaini is offline  
Old 05-04-2010, 01:07 PM   #1886
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
New recipe for The Indian Express in english:
Attached Files
File Type: zip indian_express.zip (1.8 KB, 271 views)
kiklop74 is offline  
Old 05-04-2010, 09:09 PM   #1887
mobilewilier
Connoisseur
mobilewilier ought to be getting tired of karma fortunes by now.mobilewilier ought to be getting tired of karma fortunes by now.mobilewilier ought to be getting tired of karma fortunes by now.mobilewilier ought to be getting tired of karma fortunes by now.mobilewilier ought to be getting tired of karma fortunes by now.mobilewilier ought to be getting tired of karma fortunes by now.mobilewilier ought to be getting tired of karma fortunes by now.mobilewilier ought to be getting tired of karma fortunes by now.mobilewilier ought to be getting tired of karma fortunes by now.mobilewilier ought to be getting tired of karma fortunes by now.mobilewilier ought to be getting tired of karma fortunes by now.
 
Posts: 53
Karma: 496648
Join Date: May 2010
Device: Sony PRS-600
Quote:
Originally Posted by kiklop74 View Post
New recipe for The Indian Express in english:
Hi Kiklop

I notice you have been really helpful to many fellow readers.

Would you be so kind as to start me off with a recipe for the South China Morning Post?

www.scmp.com

http://www.scmp.com/portal/site/SCMP...ervices&ss=RSS

It does require a password and username.

Many thanks
WL
mobilewilier is offline  
Old 05-05-2010, 04:33 AM   #1888
Sischa
Evangelist
Sischa knows what time it isSischa knows what time it isSischa knows what time it isSischa knows what time it isSischa knows what time it isSischa knows what time it isSischa knows what time it isSischa knows what time it isSischa knows what time it isSischa knows what time it isSischa knows what time it is
 
Posts: 428
Karma: 2370
Join Date: Jun 2006
Location: Germany
Device: Nokia 770, Ilead, Cybook G3, Kindle DX, Kindle 2, iPad, Kindle 3, PW
Quote:
Originally Posted by Starson17 View Post
Read this and this and this and this.
Thx a lot, these links were a real help!
Sischa is offline  
Old 05-05-2010, 08:08 AM   #1889
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by mobilewilier View Post
It does require a password and username.
It is very hard to write one without the password and username. You will probably have to give someone your password and username, or try writing it yourself. It's often not that hard, as you can follow another recipe that works, then tweak.
Starson17 is offline  
Old 05-05-2010, 08:12 AM   #1890
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Sischa View Post
Thx a lot, these links were a real help!
You're welcome. I've read all those pages multiple times. Each time I have a problem, I go back to them. Feel free to come back here for help. This thread is sort of a mixture of people who don't feel they can do it themselves, and those who want to get their hands dirty and tackle the recipe, but need some guidance. Kiklop is a true expert, but for simpler problems, there are others here who can help.
Starson17 is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 06:49 PM.


MobileRead.com is a privately owned, operated and funded community.