Custom recipes (archive, read-only) - Page 127

Tumaini · 05-05-2010, 09:29 AM

Arbetaren (Swedish socialist newspaper, works great!)

Code:

class Arbetaren_SE(BasicNewsRecipe):
    title          = u'Arbetaren'
    __author__            = 'Joakim Lindskog'
    description           = 'Nyheter från Arbetaren'
    publisher             = 'Arbetaren'
    category              = 'news, politics, socialism, Sweden'
    oldest_article        = 7
    delay                 = 1
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    encoding              = 'utf-8'
    language              = 'sv'

    conversion_options = {
                          'comment'   : description
                        , 'tags'      : category
                        , 'publisher' : publisher
                        , 'language'  : language
                        }

    keep_only_tags = [dict(name='div', attrs={'id':'article'})]
    remove_tags_before = dict(name='div', attrs={'id':'article'})
    remove_tags_after = dict(name='p',attrs={'id':'byline'})
    remove_tags = [
                     dict(name=['object','link','base']),
                     dict(name='p', attrs={'class':'print'}),
                     dict(name='a', attrs={'class':'addthis_button_compact'}),
                     dict(name='script')
                  ]

    feeds          = [(u'Nyheter', u'http://www.arbetaren.se/rss/arbetaren.rss?rev=123')]

kiklop74 · 05-05-2010, 10:38 AM

Quote:

Originally Posted by mobilewilier

Hi Kiklop

Would you be so kind as to start me off with a recipe for the South China Morning Post?

www.scmp.com

This site has a very complicated logon procedure. I have no time to work on that here is a starting point for you. You only need to resolve logon. The rest is done.

mobilewilier · 05-05-2010, 09:27 PM

Quote:

Originally Posted by kiklop74

This site has a very complicated logon procedure. I have no time to work on that here is a starting point for you. You only need to resolve logon. The rest is done.

Thanks so much... it works!!! many many thanks

WL

Krittika Goyal · 05-05-2010, 11:33 PM

I had this request on facebook. if someone can do it cause I am a little busy rt now..
The East Bay Express. http://www.eastbayexpress.com/ebx/Home

Thanks

smargo · 05-07-2010, 09:11 AM

Hi, I am trying to make a simple recipe for the best russian language newspaper Kommersant.
Here is the recipe:

Code:

from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1272297716(BasicNewsRecipe):
    title          = u'Kommersant'
    oldest_article = 7
    max_articles_per_feed = 100

    feeds          = [(u'Kommersant', u'http://feeds.kommersant.ru/RSS_Export/RU/daily.xml')]


    



def print_version(self,url):

        segments = url.split('=')
        article_id = segments[1]
        newurl = 'http://www.kommersant.ru/doc-rss.aspx?DocsID=' + article_id + '&print=true'

        return newurl

but it fails with the following error log (below).

what am i doing wrong ? Thanks!

Code:

ERROR: Conversion Error: <b>Failed</b>: Fetch news from Kommersant

It seems to download the articles fine:

Code:

Fetching http://www.kommersant.ru/doc-rss.aspx?DocsID=1365119
Downloaded article: Tencent разложила DST на активы // Mail.ru, "Вконтакте" и "Одноклассникам" прописали мультипликаторы from http://www.kommersant.ru

but then fails:

Code:

lxml.etree.XMLSyntaxError: Failed to parse QName 'font-size:', line 33, column 3710

olaf · 05-07-2010, 10:14 AM

When running a job to create a Kindle file from a recipe, I often look at the job details to see what progress is being made. Is there any way to save the column widths of the Job Details screen? Each time I go in, I need to expand the columns to see the detail I'm looking at. It would be nice to customize the column widths and have them stay fixed after that. (The total screen size of that panel as well)

smargo · 05-07-2010, 12:03 PM

OK, now it's generally working,

Code:

from calibre.web.feeds.news import BasicNewsRecipe
class KommersantRecipe(BasicNewsRecipe):
    title          = u'Kommersant'
    oldest_article = 7
    max_articles_per_feed = 100
    feeds          = [(u'Kommersant', u'http://feeds.kommersant.ru/RSS_Export/RU/daily.xml')]

    def print_version(self,url):
       segments = url.split('=')
       article_id = segments[1]
       newurl = 'http://www.kommersant.ru/doc.aspx?DocsID=' + article_id + '&print=true'
       return newurl

but when I read it on Kindle, pagination does not work. When I am on the first page and press "Next pages" the aricle is skipped to the last page. What can be the problem?
Thanks all!

kovidgoyal · 05-07-2010, 12:15 PM

@smargo: Use

conversion_options = {'linearize_tables':True}

smargo · 05-07-2010, 12:45 PM

@kovidgoyal

Thanks! It workes!

Some cosmetics remains to be done, but I am happy.

Raoul O'Malley · 05-07-2010, 12:53 PM

is there a way to get the recipe for the IHT - Euro edition?

thanks so much

PaxtonReader · 05-08-2010, 04:01 AM

When I send a book back to my Kindle, is there a way to keep the original file name, without embedding the author's name to a shortened version?

Starson17 · 05-08-2010, 12:49 PM

Multipage implemented for multiple page articles,
New feeds,
Miscellaneous advertising and junk removed.

Code:

#!/usr/bin/env  python
__license__   = 'GPL v3'
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
__docformat__ = 'restructuredtext en'

'''
discovermagazine.com
'''

import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, Tag

class DiscoverMagazine(BasicNewsRecipe):

    title = u'Discover Magazine'
    description = u'Science, Technology and the Future' 
    __author__ = 'Starson17' 
    language = 'en'

    oldest_article = 33
    max_articles_per_feed = 20
    no_stylesheets = True
    remove_javascript = True
    use_embedded_content  = False
    encoding = 'utf-8'
    extra_css = '.headline {font-size: x-large;} \n .fact {padding-top: 10pt}'
    
    remove_tags = [
                   dict(name='div', attrs={'id':['searchModule', 'mainMenu', 'tool-box']}),
                   dict(name='div', attrs={'id':['footer','teaser','already-subscriber','teaser-suite','related-articles']}),
                   dict(name='div', attrs={'class':['column']}),
                   dict(name='img', attrs={'src':'http://discovermagazine.com/onebyone.gif'})]

    remove_tags_after = [dict(name='div', attrs={'class':'listingBar'})]
   
    def append_page(self, soup, appendtag, position):
        pager = soup.find('span',attrs={'class':'next'})
        if pager:
           nexturl = pager.a['href']
           soup2 = self.index_to_soup(nexturl)
           texttag = soup2.find('div', attrs={'class':'articlebody'})
           newpos = len(texttag.contents)          
           self.append_page(soup2,texttag,newpos)
           texttag.extract()
           appendtag.insert(position,texttag)
    
    def preprocess_html(self, soup):
        mtag = '<meta http-equiv="Content-Language" content="en-US"/>\n<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>'
        soup.head.insert(0,mtag)    
        self.append_page(soup, soup.body, 3)
        pager = soup.find('div',attrs={'class':'listingBar'})
        if pager:
           pager.extract()        
        return soup
        
    def postprocess_html(self, soup, first_fetch):
        for tag in soup.findAll(text=re.compile('^This article is a sample')):
            tag.parent.extract()
        for tag in soup.findAll(['table', 'tr', 'td']):
            tag.name = 'div'
        for tag in soup.findAll('div', attrs={'class':'discreet advert'}):
            tag.extract()
        for tag in soup.findAll('hr', attrs={'size':'1'}):
            tag.extract()
        for tag in soup.findAll('br'):
            tag.extract()
        return soup        
 
    feeds = [
             (u'Technology', u'http://discovermagazine.com/topics/technology/rss.xml'), 
             (u'Health - Medicine', u'http://discovermagazine.com/topics/health-medicine/rss.xml'), 
             (u'Mind Brain', u'http://discovermagazine.com/topics/mind-brain/rss.xml'), 
             (u'Space', u'http://discovermagazine.com/topics/space/rss.xml'), 
             (u'Human Origins', u'http://discovermagazine.com/topics/human-origins/rss.xml'), 
             (u'Living World', u'http://discovermagazine.com/topics/living-world/rss.xml'), 
             (u'Environment', u'http://discovermagazine.com/topics/environment/rss.xml'), 
             (u'Physics & Math', u'http://discovermagazine.com/topics/physics-math/rss.xml'), 
             (u"20 Things you didn't know about...", u'http://discovermagazine.com/columns/20-things-you-didnt-know/rss.xml'), 
             (u'Fuzzy Math', u'http://discovermagazine.com/columns/fuzzy-math/rss.xml'), 
             (u'The Brain', u'http://discovermagazine.com/columns/the-brain/rss.xml'), 
             (u'What is This', u'http://discovermagazine.com/columns/what-is-this/rss.xml'),
             (u'Vital Signs', u'http://discovermagazine.com/columns/vital-signs/rss.xml'), 
             (u'Think Tech', u'http://discovermagazine.com/columns/think-tech/rss.xml'),
             (u'Future Tech', u'http://discovermagazine.com/columns/future-tech/rss.xml'),
             (u'Discover Interview', u'http://discovermagazine.com/columns/discover-interview/rss.xml'),
            ]

kiklop74 · 05-10-2010, 09:32 AM

Russian news pack:
Kommersant
Izvestia
Ria Novosti
Argumenti & fakti

smargo · 05-10-2010, 12:30 PM

@kiklop74

Your Kommersant recipe is great, thanks for this Russian pack!

Small wish: in the bottom of certain articles in Kommersant, there are links to the additional pages (of the same issue, they are not avaialble as links from rss). For example, on the page http://www.kommersant.ru/doc-rss.aspx?DocsID=1366511 there are links to page "2" - http://www.kommersant.ru/doc.aspx?DocsID=1366459 and page "3" - http://www.kommersant.ru/doc.aspx?DocsID=1366462. Is there any way to include these additional pages?

kiklop74 · 05-10-2010, 12:52 PM

There is always a way. But right now I have no spare time nor will I have it in the foreseeable future. You are on your own on this one.

05-07-2010, 12:45 PM	#1899
smargo Member Posts: 14 Karma: 10 Join Date: Aug 2007 Location: Switzerland Device: Kindle Voyage, Kobo	Kommersant @kovidgoyal Thanks! It workes! Some cosmetics remains to be done, but I am happy.

05-07-2010, 12:53 PM	#1900
Raoul O'Malley Junior Member Posts: 2 Karma: 10 Join Date: Feb 2010 Device: kindle	Internation Herald Tribune - Euro edition is there a way to get the recipe for the IHT - Euro edition? thanks so much

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Custom column read ?	pchrist7	Calibre	2	10-04-2010 02:52 AM
Archive for custom screensavers	sleeplessdave	Amazon Kindle	1	07-07-2010 12:33 PM
How to back up preferences and custom recipes?	greenapple	Calibre	3	03-29-2010 05:08 AM
Donations for Custom Recipes	ddavtian	Calibre	5	01-23-2010 04:54 PM
Help understanding custom recipes	andersent	Calibre	0	12-17-2009 02:37 PM

05-05-2010, 11:33 PM	#1894
Krittika Goyal Vox calibre Posts: 412 Karma: 1175230 Join Date: Jan 2009 Device: Sony reader prs700, kobo	I had this request on facebook. if someone can do it cause I am a little busy rt now.. The East Bay Express. http://www.eastbayexpress.com/ebx/Home Thanks

05-07-2010, 10:14 AM	#1896
olaf Enthusiast Posts: 43 Karma: 50 Join Date: May 2009 Device: Kindle3	When running a job to create a Kindle file from a recipe, I often look at the job details to see what progress is being made. Is there any way to save the column widths of the Job Details screen? Each time I go in, I need to expand the columns to see the detail I'm looking at. It would be nice to customize the column widths and have them stay fixed after that. (The total screen size of that panel as well)

05-07-2010, 12:15 PM	#1898
kovidgoyal creator of calibre Posts: 43,842 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	@smargo: Use conversion_options = {'linearize_tables':True}

05-08-2010, 04:01 AM	#1901
PaxtonReader Member Posts: 15 Karma: 10 Join Date: Apr 2010 Device: Kindle 2 Global	When I send a book back to my Kindle, is there a way to keep the original file name, without embedding the author's name to a shortened version?

05-10-2010, 12:30 PM	#1904
smargo Member Posts: 14 Karma: 10 Join Date: Aug 2007 Location: Switzerland Device: Kindle Voyage, Kobo	@kiklop74 Your Kommersant recipe is great, thanks for this Russian pack! Small wish: in the bottom of certain articles in Kommersant, there are links to the additional pages (of the same issue, they are not avaialble as links from rss). For example, on the page http://www.kommersant.ru/doc-rss.aspx?DocsID=1366511 there are links to page "2" - http://www.kommersant.ru/doc.aspx?DocsID=1366459 and page "3" - http://www.kommersant.ru/doc.aspx?DocsID=1366462. Is there any way to include these additional pages?

05-10-2010, 12:52 PM	#1905
kiklop74 Guru Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage	There is always a way. But right now I have no spare time nor will I have it in the foreseeable future. You are on your own on this one.

Advert

Advert