Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 05-21-2011, 12:53 PM   #1
xXxXxXxXxXx
Enthusiast
xXxXxXxXxXx began at the beginning.
 
Posts: 37
Karma: 10
Join Date: Apr 2011
Device: none
Recipe for Focus (DE)

Code:
class AdvancedUserRecipe1305567197(BasicNewsRecipe):
    title          = u'Focus (DE)'
    __author__  = 'xXxXxXxXxXx'
    oldest_article = 7
    max_articles_per_feed = 100
    no_stylesheets         = True
    use_embedded_content   = False
    remove_javascript      = True
    
    def print_version(self, url):
        return url + '?drucken=1'
    
    keep_only_tags = [
                              dict(name='div', attrs={'id':['article']}) ]

    remove_tags = [dict(name='div', attrs={'class':'sidebar'}),
                            dict(name='div', attrs={'class':'commentForm'}),
                            dict(name='div', attrs={'class':'comment clearfix oid-3534591 open'}),
                            dict(name='div', attrs={'class':'similarityBlock'}),
                            dict(name='div', attrs={'class':'footer'}),
                            dict(name='div', attrs={'class':'getMoreComments'}),
                            dict(name='div', attrs={'class':'moreComments'}),  
                            dict(name='div', attrs={'class':'ads'}),
                            dict(name='div', attrs={'class':'articleContent'}),

                            
                            ]
    remove_tags_after = [
                            dict(name='div',attrs={'class':['commentForm','title', 'actions clearfix']})
                                   ]
                            
   
    feeds          = [	(u'Eilmeldungen', u'http://rss2.focus.de/c/32191/f/533875/index.rss'),
                                        (u'Auto-News', u'http://rss2.focus.de/c/32191/f/443320/index.rss'),
                                        (u'Digital-News', u'http://rss2.focus.de/c/32191/f/443315/index.rss'),
                                        (u'Finanzen-News', u'http://rss2.focus.de/c/32191/f/443317/index.rss'),
                                        (u'Gesundheit-News', u'http://rss2.focus.de/c/32191/f/443314/index.rss'),
                                        (u'Immobilien-News', u'http://rss2.focus.de/c/32191/f/443318/index.rss'),
                                        (u'Kultur-News', u'http://rss2.focus.de/c/32191/f/443321/index.rss'),
		(u'Panorama-News', u'http://rss2.focus.de/c/32191/f/533877/index.rss'),
                                        (u'Politik-News', u'http://rss2.focus.de/c/32191/f/443313/index.rss'),
                                        (u'Reisen-News', u'http://rss2.focus.de/c/32191/f/443316/index.rss'),
                                        (u'Sport-News', u'http://rss2.focus.de/c/32191/f/443319/index.rss'),
                                        (u'Wissen-News', u'http://rss2.focus.de/c/32191/f/533876/index.rss'),
                         ]
xXxXxXxXxXx is offline   Reply With Quote
Old 05-21-2011, 01:50 PM   #2
schuster
Zealot
schuster doesn't litterschuster doesn't litter
 
Posts: 119
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
hi,
sorry, but this had to be a recipe for multipage (articel: Astronomie: Der erdähnlichste Exoplanet).
schuster is offline   Reply With Quote
Old 05-21-2011, 04:05 PM   #3
xXxXxXxXxXx
Enthusiast
xXxXxXxXxXx began at the beginning.
 
Posts: 37
Karma: 10
Join Date: Apr 2011
Device: none
Unfortunately but J don't know how to do recipe for multi pages, this is for me to complicated maybe author of calibre change some in API of calibre to make it a lot easier.
or some one write tutorial (very easy)
xXxXxXxXxXx is offline   Reply With Quote
Old 05-23-2011, 09:45 AM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by xXxXxXxXxXx View Post
Unfortunately but J don't know how to do recipe for multi pages, this is for me to complicated maybe author of calibre change some in API of calibre to make it a lot easier.
or some one write tutorial (very easy)
They are not hard, but you need to grab a sample, read for what you understand, then ask questions. Basically, a multipage does this:
1) it finds the link to the "next page"
2) it goes to that page and gets everything on it that the recipe author wants.
3) it pastes that stuff into the first page.
4) it repeats 1-3 until there is no "next page" link.

The recursion is tricky to understand, but it's easy to copy the multipage code, which is already set up, and almost every multipage recipe is the same and copies the same bit of multipage code, except for the part about what tag is used to find the "next page" and the part about what part of the next page to keep.

Start with the Adventure Gamer recipe, copy the whole thing here, then ask questions.
Starson17 is offline   Reply With Quote
Old 05-23-2011, 02:53 PM   #5
schuster
Zealot
schuster doesn't litterschuster doesn't litter
 
Posts: 119
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
hi starson, I hope I can also ask this questions?

you are right. but i don't understand it.
i'm experimenting without success.

Code:
class AdvancedUserRecipe1305567197(BasicNewsRecipe):
    title          = u'Focus - test'
    __author__  = 'for_test'
    oldest_article = 20
    max_articles_per_feed = 10
    no_stylesheets         = True
    use_embedded_content   = False
    remove_javascript      = True
    

    def get_article_url(self, article):
        return article.get('id', article.get('guid', None))


    def append_page(self, soup, appendtag, position):
        pager = soup.find('a',attrs={'class':'nextPage greyButton'}) # here is pager
        if pager:
           nexturl = self.INDEX + pager.a['href']
           soup2 = self.index_to_soup(nexturl)
           texttag = soup2.find('div', attrs={'class':'textBlock'}) # here is text
           for it in texttag.findAll(style=True):
               del it['style']
           newpos = len(texttag.contents)
           self.append_page(soup2,texttag,newpos)
           texttag.extract()
           appendtag.insert(position,texttag)


    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        for item in soup.findAll('span', attrs={'class':'overhead'}): # here is bevor textblock
            item.extract()
        self.append_page(soup, soup.body, 3)
        pager = soup.find('div',attrs={'class':'pageCounter'}) # this is pager on next side
        if pager:
           pager.extract()
        return self.adeify_images(soup)


    feeds          = [	(u'Eilmeldungen', u'http://rss2.focus.de/c/32191/f/533875/index.rss'),
                                        (u'Wissen-News', u'http://rss2.focus.de/c/32191/f/533876/index.rss')]

# feed with multipage in "wissen-news":
# Ozonloch-Studie - Zwischen Euphorie und Hysterie
is this right? but i've got no luck to grab it.
it grabs only the normal pages, the multipages are lost.

greetings
schuster is offline   Reply With Quote
Old 05-24-2011, 01:55 PM   #6
xXxXxXxXxXx
Enthusiast
xXxXxXxXxXx began at the beginning.
 
Posts: 37
Karma: 10
Join Date: Apr 2011
Device: none
J hope that someone finally create recipe for this website, because the best way of learning is learning on examples.

So maybe you Starson17 create this recipe ?
xXxXxXxXxXx is offline   Reply With Quote
Old 05-26-2011, 01:34 PM   #7
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by schuster View Post
hi starson, I hope I can also ask this questions?
is this right?
I do not have much time, but post a link to a multipage article, and I will look at it. I did not see any in my brief look.

Is "pager" ever found? IOW, is this if code block ever entered?:
Code:
if pager:
Starson17 is offline   Reply With Quote
Old 05-26-2011, 01:51 PM   #8
schuster
Zealot
schuster doesn't litterschuster doesn't litter
 
Posts: 119
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
hi starson,
here a link to an articel that use multipage.

Code:
http://rss2.focus.de/c/32191/f/533876/s/151d269a/l/0L0Sfocus0Bde0Cwissen0Cwissenschaft0Cmeteorologie0Ctid0E224240Ctornados0Edie0Eden0Esturm0Ejagen0Iaid0I630A10A40Bhtml/story01.htm
Quote:
Is "pager" ever found?
i think so, but the re-insert seems not to work.
schuster is offline   Reply With Quote
Old 05-26-2011, 03:45 PM   #9
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
I see 2 problems.
1) You use self.INDEX in your recipe, but it is not defined.
2) I ran the recipe with that removed, and it found instances of pager:
Code:
pager = soup.find('a',attrs={'class':'nextPage greyButton'})
Where there was no <a> element with href attribute.
Code:
pager.a['href']
Until these are fixed, it won't work. The pager is a tag that includes what you need for building a link to the next page. It must only be found on pages that are multipage. You must find or create the link to next page (using INDEX plus href attribute or whatever) from pager. Pager must never be found on the last page of the multipage article (this tells it when it is done building the entire article).

I cannot read German, so can only guess at how to do this.
Starson17 is offline   Reply With Quote
Old 05-08-2016, 04:24 AM   #10
Aimylios
Member
Aimylios began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Apr 2016
Device: Tolino Vision 3HD
Hi,

the addresses of the focus.de RSS feeds have been changed. Here's an updated version of the focus_de.recipe.

Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function

'''
focus.de
'''

from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1305567197(BasicNewsRecipe):
    title       = 'Focus (DE)'
    __author__  = 'Anonymous'
    description = 'RSS-Feeds von Focus.de'
    language    = 'de'

    oldest_article            = 7
    max_articles_per_feed     = 100
    no_stylesheets            = True
    remove_javascript         = True
    use_embedded_content      = False
    remove_empty_feeds        = True
    ignore_duplicate_articles = {'title', 'url'}

    feeds = [
        ('Politik', 'http://rss.focus.de/politik/'),
        ('Finanzen', 'http://rss.focus.de/finanzen/'),
        ('Gesundheit', 'http://rss.focus.de/gesundheit/'),
        ('Panorama', 'http://rss.focus.de/panorama/'),
        ('Digital', 'http://rss.focus.de/digital/'),
        ('Reisen', 'http://rss.focus.de/reisen/')
    ]

    keep_only_tags = [
        dict(name='div', attrs={'id':'article'})
    ]

    remove_tags = [
        dict(name='div', attrs={'class':['inimagebuttons',
                                         'kolumneHead clearfix']})
    ]

    remove_attributes = ['width', 'height']

    extra_css = 'h1 {font-size: 1.6em; text-align: left; margin-top: 0em} \
                 h2 {font-size: 1em; text-align: left} \
                 .overhead {margin-bottom: 0em} \
                 .caption {font-size: 0.6em}'

    def print_version(self, url):
        return url + '?drucken=1'

    def preprocess_html(self, soup):
        # remove useless references to videos
        for item in soup.findAll('h2'):
            if item.string:
                txt = item.string.upper()
                if txt.startswith('IM VIDEO:') or txt.startswith('VIDEO:'):
                    item.extract()
        return soup
Aimylios is offline   Reply With Quote
Reply

Tags
recipe


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Still losing focus JKenP Calibre 4 05-27-2011 08:17 AM
Focus on First Wave of E-book Marketing DMcCunney News 6 12-18-2010 07:45 PM
Focus annoyance edbro Calibre 2 10-05-2010 06:07 PM
Focus not properly shifting on links JSWolf Feedback 9 08-14-2010 06:12 PM
Focus the reply message bo kovidgoyal Feedback 9 02-11-2009 03:30 AM


All times are GMT -4. The time now is 04:45 PM.


MobileRead.com is a privately owned, operated and funded community.