Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 01-21-2010, 12:54 PM   #1216
XanthanGum
Connoisseur
XanthanGum began at the beginning.
 
XanthanGum's Avatar
 
Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
Ordering of Recipes in Calibre's Add A Custom News Source

Kovid,

Is it possible to order alphabetically the list of news sources in the "Add a custom news source" list? If that option is already built-in, how can I order my list?

My list is not currently alphabetized. It would be so much nicer for me if I could have those recipes listed in alphabetical order.

Thanks...

XG
XanthanGum is offline  
Old 01-21-2010, 01:03 PM   #1217
XanthanGum
Connoisseur
XanthanGum began at the beginning.
 
XanthanGum's Avatar
 
Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
Recipes in Custom List Not Found in Main English List

Kovid,

I found two recipes in my Custom list that are not listed in the English list. They are:

- NRC International
- Politiken - English

two great recipes written by kwetal. They both work fine and I schedule them for download.

But...why don't they appear in the English list? Have I somehow moved them to the Custom list or were they ever listed in the English list?

I ask because I was working on creating both recipes when I discovered that they had already been done. (Thanks kwetal.)

Any help or explanation would be greatly appreciated.

Bye...

XG

PS

Is it possible to extend the login time period? Quite often I'm experimenting with recipes and when I come back here, I have to log back in.
XanthanGum is offline  
Old 01-21-2010, 01:21 PM   #1218
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,386
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Custom recipes are always listed only in the custom recipes section. Only builtin recipes are listed in the language sections.
kovidgoyal is offline  
Old 01-21-2010, 01:21 PM   #1219
tnronin
Zealot
tnronin doesn't littertnronin doesn't littertnronin doesn't litter
 
tnronin's Avatar
 
Posts: 118
Karma: 210
Join Date: Jan 2010
Location: Mid-Tennessee
Device: PRS-300
HUGE thanks for this one!!

Quote:
Originally Posted by cix3 View Post
Hello,

Here's my first stab at a recipe for The New Republic (www.tnr.com). It aggregates all articles and blogs, minus the images. Enjoy!

Code:
class The_New_Republic(BasicNewsRecipe):
    title = 'The New Republic'
    __author__ = 'cix3'
    description = 'Intelligent, stimulating and rigorous examination of American politics, foreign policy and culture'
    timefmt = ' [%b %d, %Y]'

    oldest_article = 7
    max_articles_per_feed = 100

    remove_tags = [dict(name='div', attrs={'class':['print-logo', 'print-site_name', 'img-left', 'print-source_url']}), dict(name='hr', attrs={'class':'print-hr'}), dict(name='img')]

    feeds = [
        ('Politics', 'http://www.tnr.com/rss/articles/Politics'),
        ('Books and Arts', 'http://www.tnr.com/rss/articles/Books-and-Arts'),
        ('Economy', 'http://www.tnr.com/rss/articles/Economy'),
        ('Environment and Energy', 'http://www.tnr.com/rss/articles/Environment-%2526-Energy'),
        ('Health Care', 'http://www.tnr.com/rss/articles/Health-Care'),
        ('Urban Policy', 'http://www.tnr.com/rss/articles/Urban-Policy'),
        ('World', 'http://www.tnr.com/rss/articles/World'),
        ('Film', 'http://www.tnr.com/rss/articles/Film'),
        ('Books', 'http://www.tnr.com/rss/articles/books'),
        ('The Plank', 'http://www.tnr.com/rss/blogs/The-Plank'),
        ('The Treatment', 'http://www.tnr.com/rss/blogs/The-Treatment'),
        ('The Spine', 'http://www.tnr.com/rss/blogs/The-Spine'),
        ('The Stash', 'http://www.tnr.com/rss/blogs/The-Stash'),
        ('The Vine', 'http://www.tnr.com/rss/blogs/The-Vine'),
        ('The Avenue', 'http://www.tnr.com/rss/blogs/The-Avenue'),
        ('William Galston', 'http://www.tnr.com/rss/blogs/William-Galston'),
        ('Simon Johnson', 'http://www.tnr.com/rss/blogs/Simon-Johnson'),
        ('Ed Kilgore', 'http://www.tnr.com/rss/blogs/Ed-Kilgore'),
        ('Damon Linker', 'http://www.tnr.com/rss/blogs/Damon-Linker'),
        ('John McWhorter', 'http://www.tnr.com/rss/blogs/John-McWhorter')
            ]

    def print_version(self, url):
        return url.replace('http://www.tnr.com/', 'http://www.tnr.com/print/')
tnronin is offline  
Old 01-21-2010, 01:30 PM   #1220
XanthanGum
Connoisseur
XanthanGum began at the beginning.
 
XanthanGum's Avatar
 
Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
NRC and Politiken Recipes from kwetal

Kovid,

Thanks for clarification.

Can you add kwetal's two recipes to the English list? The NRC English news source is Nederland and the Politiken news source is in Denmark. See the zipped attachment.

Thanks...

XG
Attached Files
File Type: zip nrcAndPolitiken.zip (1.7 KB, 173 views)
XanthanGum is offline  
Old 01-21-2010, 01:34 PM   #1221
tnronin
Zealot
tnronin doesn't littertnronin doesn't littertnronin doesn't litter
 
tnronin's Avatar
 
Posts: 118
Karma: 210
Join Date: Jan 2010
Location: Mid-Tennessee
Device: PRS-300
Would it be possible to get a recipe from this location? http://www.hillsdale.edu/news/imprimis.asp

Thanks.
tnronin is offline  
Old 01-21-2010, 01:53 PM   #1222
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,386
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
NRC is under netherlands, use the search to find it.
kovidgoyal is offline  
Old 01-21-2010, 02:18 PM   #1223
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 328
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
re: Problem with Wall Street Journal (free) recipe

I should know better than to post in forums before I've finiished my coffee. I was coding the solution to date locales when kovid posted his suggestion. Here is the fixed recipe, which manually decodes the WSJ US locale dateline for comparison. evanmaastrigt--if you could test this in your locale I'd appreciate it.

Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'

'''
online.wsj.com
'''
import string, re
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, Tag, NavigableString
from datetime import timedelta, datetime, date

class WSJ(BasicNewsRecipe):
    # formatting adapted from original recipe by Kovid Goyal and Sujata Raman
    title          = u'Wall Street Journal (free)'
    __author__     = 'Nick Redding'
    language = 'en'
    description = ('All the free content from the Wall Street Journal (business, financial and political news)')
 
    no_stylesheets = True
    timefmt = ' [%b %d]'

    # customization notes: delete sections you are not interested in
    # set omit_paid_content to False if you want the paid content article snippets
    # set oldest_article to the maximum number of days back from today to include articles
    sectionlist = [
                        ['/home-page','Front Page'],
                        ['/public/page/news-opinion-commentary.html','Commentary'],
                        ['/public/page/news-global-world.html','World News'],
                        ['/public/page/news-world-business.html','US News'],
                        ['/public/page/news-business-us.html','Business'],
                        ['/public/page/news-financial-markets-stock.html','Markets'],
                        ['/public/page/news-tech-technology.html','Technology'],
                        ['/public/page/news-personal-finance.html','Personal Finnce'],
                        ['/public/page/news-lifestyle-arts-entertainment.html','Life & Style'],
                        ['/public/page/news-real-estate-homes.html','Real Estate'],
                        ['/public/page/news-career-jobs.html','Careers'],
                        ['/public/page/news-small-business-marketing.html','Small Business']
                    ]
    oldest_article = 2
    omit_paid_content = True
    
    extra_css   = '''h1{font-size:large; font-family:Times,serif;}
                    h2{font-family:Times,serif; font-size:small; font-style:italic;}
                    .subhead{font-family:Times,serif; font-size:small; font-style:italic;}
                    .insettipUnit {font-family:Times,serif;font-size:xx-small;}
                    .targetCaption{font-size:x-small; font-family:Times,serif; font-style:italic; margin-top: 0.25em;}
                    .article{font-family:Times,serif; font-size:x-small;}
                    .tagline { font-size:xx-small;}
                    .dateStamp {font-family:Times,serif;}
                    h3{font-family:Times,serif; font-size:xx-small;}
                    .byline {font-family:Times,serif; font-size:xx-small; list-style-type: none;}
                    .metadataType-articleCredits {list-style-type: none;}
                    h6{font-family:Times,serif; font-size:small; font-style:italic;}
                    .paperLocation{font-size:xx-small;}'''


    remove_tags_before = dict({'class':re.compile('^articleHeadlineBox')})
    remove_tags =   [   dict({'id':re.compile('^articleTabs_tab_')}),
                        #dict(id=["articleTabs_tab_article", "articleTabs_tab_comments",
                        #         "articleTabs_tab_interactive","articleTabs_tab_video",
                        #         "articleTabs_tab_map","articleTabs_tab_slideshow"]),
			{'class':  ['footer_columns','network','insetCol3wide','interactive','video','slideshow','map',
                                    'insettip','insetClose','more_in', "insetContent",
                        #            'articleTools_bottom','articleTools_bottom mjArticleTools',
                                    'aTools', 'tooltip', 
                                    'adSummary', 'nav-inline','insetFullBracket']},
                        dict({'class':re.compile('^articleTools_bottom')}),
                        dict(rel='shortcut icon')
                    ]
    remove_tags_after = [dict(id="article_story_body"), {'class':"article story"}]
    
    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        return br


    def preprocess_html(self,soup):

        def decode_us_date(datestr):
            udate = datestr.strip().lower().split()
            m = ['january','february','march','april','may','june','july','august','september','october','november','december'].index(udate[0])+1
            d = int(udate[1])
            y = int(udate[2])
            return date(y,m,d)
        
        # check if article is paid content
        if self.omit_paid_content:
            divtags = soup.findAll('div','tooltip')
            if divtags:
                for divtag in divtags:
                    if divtag.find(text="Subscriber Content"):
                        return None
                    
        # check if article is too old
        datetag = soup.find('li',attrs={'class' : re.compile("^dateStamp")})
        if datetag:
            dateline_string = self.tag_to_string(datetag,False)
            date_items = dateline_string.split(',')
            datestring = date_items[0]+date_items[1]
            article_date = decode_us_date(datestring)
            earliest_date = date.today() - timedelta(days=self.oldest_article)
            if article_date < earliest_date:
                self.log("Skipping article dated %s" % datestring)
                return None
            datetag.parent.extract()

            # place dateline in article heading
            
            bylinetag = soup.find('h3','byline')
            if bylinetag:
                h3bylinetag = bylinetag
            else:
                bylinetag = soup.find('li','byline')
                if bylinetag:
                    h3bylinetag = bylinetag.h3
                    if not h3bylinetag:
                        h3bylinetag = bylinetag
                    bylinetag = bylinetag.parent
            if bylinetag:
                if h3bylinetag.a:
                    bylinetext = 'By '+self.tag_to_string(h3bylinetag.a,False)
                else:
                    bylinetext = self.tag_to_string(h3bylinetag,False)
                h3byline = Tag(soup,'h3',[('class','byline')])
                if bylinetext.isspace() or (bylinetext == ''):
                    h3byline.insert(0,NavigableString(date_items[0]+','+date_items[1]))
                else:
                    h3byline.insert(0,NavigableString(bylinetext+u'\u2014'+date_items[0]+','+date_items[1]))
                bylinetag.replaceWith(h3byline)
            else:                  
                headlinetag = soup.find('div',attrs={'class' : re.compile("^articleHeadlineBox")})
                if headlinetag:
                    dateline = Tag(soup,'h3', [('class','byline')])
                    dateline.insert(0,NavigableString(date_items[0]+','+date_items[1]))
                    headlinetag.insert(len(headlinetag),dateline)
        else: # if no date tag, don't process this page--it's not a news item
            return None
        # This gets rid of the annoying superfluous bullet symbol preceding columnist bylines
        ultag = soup.find('ul',attrs={'class' : 'cMetadata metadataType-articleCredits'})
        if ultag:
            a = ultag.h3
            if a:
                ultag.replaceWith(a)
        return soup

    def parse_index(self):

        articles = {}
        key = None
        ans = []

        def parse_index_page(page_name,page_title):

            def article_title(tag):
                atag = tag.find('h2') # title is usually in an h2 tag
                if not atag: # if not, get text from the a tag
                    atag = tag.find('a',href=True)
                    if not atag:
                        return ''
                    t = self.tag_to_string(atag,False)
                    if t == '':
                        # sometimes the title is in the second a tag
                        atag.extract()
                        atag = tag.find('a',href=True)
                        if not atag:
                            return ''
                        return self.tag_to_string(atag,False)
                    return t
                return self.tag_to_string(atag,False)

            def article_author(tag):
                atag = tag.find('strong') # author is usually in a strong tag
                if not atag:
                     atag = tag.find('h4') # if not, look for an h4 tag
                     if not atag:
                         return ''
                return self.tag_to_string(atag,False)

            def article_summary(tag):
                atag = tag.find('p')
                if not atag:
                    return ''
                subtag = atag.strong
                if subtag:
                    subtag.extract()
                return self.tag_to_string(atag,False)

            def article_url(tag):
                atag = tag.find('a',href=True)
                if not atag:
                    return ''
                url = re.sub(r'\?.*', '', atag['href'])
                return url

            def handle_section_name(tag):
                # turns a tag into a section name with special processing
                # for Wat's News, U.S., World & U.S. and World
                s = self.tag_to_string(tag,False)
                if ("What" in s) and ("News" in s):
                    s = "What's News"
                elif (s == "U.S.") or (s == "World & U.S.") or (s == "World"):
                    s = s + " News"
                return s

                

            mainurl = 'http://online.wsj.com'
            pageurl = mainurl+page_name
            #self.log("Page url %s" % pageurl)
            soup = self.index_to_soup(pageurl)
            # Find each instance of div with class including "headlineSummary"
            for divtag in soup.findAll('div',attrs={'class' : re.compile("^headlineSummary")}):
                # divtag contains all article data as ul's and li's
                # first, check if there is an h3 tag which provides a section name
                stag = divtag.find('h3')
                if stag:
                    if stag.parent['class'] == 'dynamic':
                        # a carousel of articles is too complex to extract a section name
                        # for each article, so we'll just call the section "Carousel"
                        section_name = 'Carousel'
                    else:
                        section_name = handle_section_name(stag)
                else:
                    section_name = "What's News"
                #self.log("div Section %s" % section_name)
                # find each top-level ul in the div
                # we don't restrict to class = newsItem because the section_name
                # sometimes changes via a ul tag inside the div
                for ultag in divtag.findAll('ul',recursive=False):
                    stag = ultag.find('h3')
                    if stag:
                        if stag.parent.name == 'ul':
                            # section name has changed
                            section_name = handle_section_name(stag)
                            #self.log("ul Section %s" % section_name)
                            # delete the h3 tag so it doesn't get in the way
                            stag.extract()
                    # find each top level li in the ul
                    for litag in ultag.findAll('li',recursive=False):
                        stag = litag.find('h3')
                        if stag:
                            # section name has changed
                            section_name = handle_section_name(stag)
                            #self.log("li Section %s" % section_name)
                            # delete the h3 tag so it doesn't get in the way
                            stag.extract()
                        # if there is a ul tag inside the li it is superfluous;
                        # it is probably a list of related articles
                        utag = litag.find('ul')
                        if utag:
                            utag.extract()
                        # now skip paid subscriber articles if desired
                        subscriber_tag = litag.find(text="Subscriber Content")
                        if subscriber_tag:
                                if self.omit_paid_content:
                                    continue             
                                # delete the tip div so it doesn't get in the way
                                tiptag = litag.find("div", { "class" : "tipTargetBox" })
                                if tiptag:
                                    tiptag.extract()
                        h1tag = litag.h1
                        # if there's an h1 tag, it's parent is a div which should replace
                        # the li tag for the analysis
                        if h1tag:
                            litag = h1tag.parent                  
                        h5tag = litag.h5
                        if h5tag:
                            # section mame has changed
                            section_name = self.tag_to_string(h5tag,False)
                            #self.log("h5 Section %s" % section_name)
                            # delete the h5 tag so it doesn't get in the way
                            h5tag.extract()
                        url = article_url(litag)
                        if url == '':
                            continue
                        if url.startswith("/article"):
                            url = mainurl+url
                        if not url.startswith("http://online.wsj.com"):
                            continue
                        if not url.endswith(".html"):
                            continue
                        if 'video' in url:
                            continue
                        title = article_title(litag)
                        if title == '':
                            continue
                        #self.log("URL %s" % url)
                        #self.log("Title %s" % title)
                        pubdate = ''
                        #self.log("Date %s" % pubdate)
                        author = article_author(litag)
                        if author == '':
                            author = section_name
                        elif author == section_name:
                            author = ''
                        else:
                            author = section_name+': '+author
                        #if not author == '':
                        #    self.log("Author %s" % author)
                        description = article_summary(litag)
                        #if not description == '':
                        #    self.log("Description %s" % description)
                        if not articles.has_key(page_title):
                            articles[page_title] = []
                        articles[page_title].append(dict(title=title,url=url,date=pubdate,description=description,author=author,content=''))

    
        for page_name,page_title in self.sectionlist:
            parse_index_page(page_name,page_title)
            ans.append(page_title)

        ans = [(key, articles[key]) for key in ans if articles.has_key(key)]
        return ans
nickredding is offline  
Old 01-21-2010, 02:22 PM   #1224
XanthanGum
Connoisseur
XanthanGum began at the beginning.
 
XanthanGum's Avatar
 
Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
NRC next vs. NRC International

Quote:
Originally Posted by kovidgoyal View Post
Custom recipes are always listed only in the custom recipes section. Only builtin recipes are listed in the language sections.
Kovid,

The nrc entry under the Dutch section is in the Dutch language. The NRC International service in English and I can't find it listed in any of the English lists of Calibre.

XG
XanthanGum is offline  
Old 01-21-2010, 02:24 PM   #1225
XanthanGum
Connoisseur
XanthanGum began at the beginning.
 
XanthanGum's Avatar
 
Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
the nrc recipes

Quote:
Originally Posted by kovidgoyal View Post
NRC is under netherlands, use the search to find it.
Kovid,

Sorry, I meant to reply to this post of yours. I'm not sure what happened.

Anyway...The nrc entry under the Dutch section is in the Dutch language. The NRC International service in English and I can't find it listed in any of the English lists of Calibre.

XG
XanthanGum is offline  
Old 01-21-2010, 02:59 PM   #1226
evanmaastrigt
Connoisseur
evanmaastrigt doesn't litterevanmaastrigt doesn't litter
 
Posts: 78
Karma: 192
Join Date: Nov 2009
Device: Sony PRS-600
Quote:
Originally Posted by kovidgoyal View Post
NRC is under netherlands, use the search to find it.
Well, I never noticed before, but it isn't. What is there is 'nrcnext', the newsblog of the sister publication of the 'NRC Handelsblad'. 'NRC International' offers the most interesting articles of the latter in an English translation.

I never made a recipe for 'NRC Handelsblad' because they offer a DRM-free subscription for an electronic version (ePub, Mobi or PDF) for 84 euros/year. A bargain for what is sort of the New York Times of the Netherlands.

In addition there is also the 'Fokke en Sukke' recipe that combines the cartoons published both in 'nrcnext' and 'NRC Handelsblad'.

(And yes, we have 25 political parties as well :-)

Edwin
evanmaastrigt is offline  
Old 01-21-2010, 03:07 PM   #1227
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,386
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@evanmasstrigt: I'm happy to support their efforts to provide a ebook version. I have been planning to write a sub class for BasicNewsRecipe that allows download of news published in EPUB format (via the subscription) and outputs the news in OPF+HTML as needed for the conversion system.
kovidgoyal is offline  
Old 01-21-2010, 06:04 PM   #1228
cix3
Member
cix3 began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
Quote:
Originally Posted by tnronin View Post
HUGE thanks for this one!!
You're welcome. Enjoy.

I've been meaning to come back to this recipe for some time. The text shows up as a lighter shade of grey (rather than black) on my Kindle 2. I imagine a quick recipe edit related to the style will fix it. Will get around to fixing it eventually.
cix3 is offline  
Old 01-22-2010, 06:39 AM   #1229
hallo.amt
Junior Member
hallo.amt began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jan 2010
Device: Sony PRS-505
Recipe for fr-online.de

Hi,

I wrote a recipe for fr-online.de which is from the German "Frankfurter Rundschau"

Code:
import re
from calibre.web.feeds.news import BasicNewsRecipe
__license__   = 'GPL v3'
__copyright__ = '2009, Justus Bisser <justus.bisser at gmail.com>'
'''
fr-online.de
'''

from calibre.web.feeds.news import BasicNewsRecipe

class Spiegel_ger(BasicNewsRecipe):
    title                 = 'Frankfurter Rundschau'
    __author__            = 'Justus Bisser'
    description           = "Dies ist die Online-Ausgabe der Frankfurter Rundschau. Um die abgerufenen individuell einzustellen bearbeiten sie die Liste im erweiterten Modus. Die Feeds findet man auf http://www.fr-online.de/verlagsservice/fr_newsreader/?em_cnt=574255"
    publisher             = 'Druck- und Verlagshaus Frankfurt am Main GmbH'
    category              = 'FR Online, Frankfurter Rundschau, Nachrichten, News,Dienste, RSS, RSS, Feedreader, Newsfeed, iGoogle, Netvibes, Widget'
    oldest_article        = 7
    max_articles_per_feed = 100
    language              = 'de'
    lang                  = 'de-DE'
    no_stylesheets        = True
    use_embedded_content  = False
    #encoding              = 'cp1252'

    conversion_options = {
                          'comment'          : description
                        , 'tags'             : category
                        , 'publisher'        : publisher
                        , 'language'         : lang
                        }

    recursions = 0
    max_articles_per_feed = 100
    #keep_only_tags = [dict(name='div', attrs={'class':'text'})]
    #tags_remove = [dict(name='div', attrs={'style':'text-align: left; margin: 4px 0px 0px 4px; width: 200px; float: right;'})]
    remove_attributes = ['style']
    feeds = []
    #remove_tags_before = [dict(name='div', attrs={'style':'padding-left: 0px;'})]
    #remove_tags_after = [dict(name='div', attrs={'class':'box_head_text'})]
        
    # enable for all news
    allNews = 0
    if allNews:
        feeds = [(u'Frankfurter Rundschau', u'http://www.fr-online.de/rss/sport/index.xml')]
    else:
        #select the feeds you like
        feeds = [(u'Nachrichten', u'http://www.fr-online.de/rss/politik/index.xml')]
        feeds.append((u'Kommentare und Analysen', u'http://www.fr-online.de/rss/meinung/index.xml'))
        feeds.append((u'Dokumentationen', u'http://www.fr-online.de/rss/dokumentation/index.xml'))
        feeds.append((u'Deutschlandtrend', u'http://www.fr-online.de/rss/deutschlandtrend/index.xml'))
        feeds.append((u'Wirtschaft', u'http://www.fr-online.de/rss/wirtschaft/index.xml'))
        feeds.append((u'Sport', u'http://www.fr-online.de/rss/sport/index.xml'))
        feeds.append((u'Feuilleton', u'http://www.fr-online.de/rss/feuilleton/index.xml'))
        feeds.append((u'Panorama', u'http://www.fr-online.de/rss/panorama/index.xml'))
        feeds.append((u'Rhein Main und Hessen', u'http://www.fr-online.de/rss/hessen/index.xml'))
        feeds.append((u'Fitness und Gesundheit', u'http://www.fr-online.de/rss/fit/index.xml'))
        feeds.append((u'Multimedia', u'http://www.fr-online.de/rss/multimedia/index.xml'))
        feeds.append((u'Wissen und Bildung', u'http://www.fr-online.de/rss/wissen/index.xml'))
    
    def get_article_url(self, article):
        #string = article.link
        #string = string.replace('0C', '/')
        #string = string.replace('0I', '_')
        #string = string.replace('0E', '-')
        #string = string.replace('0B', '.')
        #string = string[string.find("fr-online.de"):]
        #string = "http://www." + string
        #return string
        url = article.link
        #url = url.replace('0A', '0')
        #url = url.replace('0I', '_')
        regex = re.compile("0C[0-9]{6,8}0A?")
        
        liste = regex.findall(url)
        string = liste.pop(0)
        string = string[2:len(string)-1]
        return "http://www.fr-online.de/_em_cms/_globals/print.php?em_cnt=" + string
hallo.amt is offline  
Old 01-22-2010, 11:43 AM   #1230
geneaber
Connoisseur
geneaber doesn't littergeneaber doesn't litter
 
Posts: 82
Karma: 118
Join Date: Dec 2005
Device: Kindle 2
I would like a recipe for The Week. The rss feeds can be found at http://www.theweek.com/home/sitemap. Can anyone help?
geneaber is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 04:09 AM.


MobileRead.com is a privately owned, operated and funded community.