05-21-2011, 12:53 PM | #1 |
Enthusiast
Posts: 37
Karma: 10
Join Date: Apr 2011
Device: none
|
Recipe for Focus (DE)
Code:
class AdvancedUserRecipe1305567197(BasicNewsRecipe): title = u'Focus (DE)' __author__ = 'xXxXxXxXxXx' oldest_article = 7 max_articles_per_feed = 100 no_stylesheets = True use_embedded_content = False remove_javascript = True def print_version(self, url): return url + '?drucken=1' keep_only_tags = [ dict(name='div', attrs={'id':['article']}) ] remove_tags = [dict(name='div', attrs={'class':'sidebar'}), dict(name='div', attrs={'class':'commentForm'}), dict(name='div', attrs={'class':'comment clearfix oid-3534591 open'}), dict(name='div', attrs={'class':'similarityBlock'}), dict(name='div', attrs={'class':'footer'}), dict(name='div', attrs={'class':'getMoreComments'}), dict(name='div', attrs={'class':'moreComments'}), dict(name='div', attrs={'class':'ads'}), dict(name='div', attrs={'class':'articleContent'}), ] remove_tags_after = [ dict(name='div',attrs={'class':['commentForm','title', 'actions clearfix']}) ] feeds = [ (u'Eilmeldungen', u'http://rss2.focus.de/c/32191/f/533875/index.rss'), (u'Auto-News', u'http://rss2.focus.de/c/32191/f/443320/index.rss'), (u'Digital-News', u'http://rss2.focus.de/c/32191/f/443315/index.rss'), (u'Finanzen-News', u'http://rss2.focus.de/c/32191/f/443317/index.rss'), (u'Gesundheit-News', u'http://rss2.focus.de/c/32191/f/443314/index.rss'), (u'Immobilien-News', u'http://rss2.focus.de/c/32191/f/443318/index.rss'), (u'Kultur-News', u'http://rss2.focus.de/c/32191/f/443321/index.rss'), (u'Panorama-News', u'http://rss2.focus.de/c/32191/f/533877/index.rss'), (u'Politik-News', u'http://rss2.focus.de/c/32191/f/443313/index.rss'), (u'Reisen-News', u'http://rss2.focus.de/c/32191/f/443316/index.rss'), (u'Sport-News', u'http://rss2.focus.de/c/32191/f/443319/index.rss'), (u'Wissen-News', u'http://rss2.focus.de/c/32191/f/533876/index.rss'), ] |
05-21-2011, 01:50 PM | #2 |
Zealot
Posts: 119
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
|
hi,
sorry, but this had to be a recipe for multipage (articel: Astronomie: Der erdähnlichste Exoplanet). |
05-21-2011, 04:05 PM | #3 |
Enthusiast
Posts: 37
Karma: 10
Join Date: Apr 2011
Device: none
|
Unfortunately but J don't know how to do recipe for multi pages, this is for me to complicated maybe author of calibre change some in API of calibre to make it a lot easier.
or some one write tutorial (very easy) |
05-23-2011, 09:45 AM | #4 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
1) it finds the link to the "next page" 2) it goes to that page and gets everything on it that the recipe author wants. 3) it pastes that stuff into the first page. 4) it repeats 1-3 until there is no "next page" link. The recursion is tricky to understand, but it's easy to copy the multipage code, which is already set up, and almost every multipage recipe is the same and copies the same bit of multipage code, except for the part about what tag is used to find the "next page" and the part about what part of the next page to keep. Start with the Adventure Gamer recipe, copy the whole thing here, then ask questions. |
|
05-23-2011, 02:53 PM | #5 |
Zealot
Posts: 119
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
|
hi starson, I hope I can also ask this questions?
you are right. but i don't understand it. i'm experimenting without success. Code:
class AdvancedUserRecipe1305567197(BasicNewsRecipe): title = u'Focus - test' __author__ = 'for_test' oldest_article = 20 max_articles_per_feed = 10 no_stylesheets = True use_embedded_content = False remove_javascript = True def get_article_url(self, article): return article.get('id', article.get('guid', None)) def append_page(self, soup, appendtag, position): pager = soup.find('a',attrs={'class':'nextPage greyButton'}) # here is pager if pager: nexturl = self.INDEX + pager.a['href'] soup2 = self.index_to_soup(nexturl) texttag = soup2.find('div', attrs={'class':'textBlock'}) # here is text for it in texttag.findAll(style=True): del it['style'] newpos = len(texttag.contents) self.append_page(soup2,texttag,newpos) texttag.extract() appendtag.insert(position,texttag) def preprocess_html(self, soup): for item in soup.findAll(style=True): del item['style'] for item in soup.findAll('span', attrs={'class':'overhead'}): # here is bevor textblock item.extract() self.append_page(soup, soup.body, 3) pager = soup.find('div',attrs={'class':'pageCounter'}) # this is pager on next side if pager: pager.extract() return self.adeify_images(soup) feeds = [ (u'Eilmeldungen', u'http://rss2.focus.de/c/32191/f/533875/index.rss'), (u'Wissen-News', u'http://rss2.focus.de/c/32191/f/533876/index.rss')] # feed with multipage in "wissen-news": # Ozonloch-Studie - Zwischen Euphorie und Hysterie it grabs only the normal pages, the multipages are lost. greetings |
05-24-2011, 01:55 PM | #6 |
Enthusiast
Posts: 37
Karma: 10
Join Date: Apr 2011
Device: none
|
J hope that someone finally create recipe for this website, because the best way of learning is learning on examples.
So maybe you Starson17 create this recipe ? |
05-26-2011, 01:34 PM | #7 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Is "pager" ever found? IOW, is this if code block ever entered?: Code:
if pager: |
|
05-26-2011, 01:51 PM | #8 | |
Zealot
Posts: 119
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
|
hi starson,
here a link to an articel that use multipage. Code:
http://rss2.focus.de/c/32191/f/533876/s/151d269a/l/0L0Sfocus0Bde0Cwissen0Cwissenschaft0Cmeteorologie0Ctid0E224240Ctornados0Edie0Eden0Esturm0Ejagen0Iaid0I630A10A40Bhtml/story01.htm Quote:
|
|
05-26-2011, 03:45 PM | #9 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I see 2 problems.
1) You use self.INDEX in your recipe, but it is not defined. 2) I ran the recipe with that removed, and it found instances of pager: Code:
pager = soup.find('a',attrs={'class':'nextPage greyButton'}) Code:
pager.a['href'] I cannot read German, so can only guess at how to do this. |
05-08-2016, 04:24 AM | #10 |
Member
Posts: 16
Karma: 10
Join Date: Apr 2016
Device: Tolino Vision 3HD
|
Hi,
the addresses of the focus.de RSS feeds have been changed. Here's an updated version of the focus_de.recipe. Code:
#!/usr/bin/env python2 # vim:fileencoding=utf-8 from __future__ import unicode_literals, division, absolute_import, print_function ''' focus.de ''' from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1305567197(BasicNewsRecipe): title = 'Focus (DE)' __author__ = 'Anonymous' description = 'RSS-Feeds von Focus.de' language = 'de' oldest_article = 7 max_articles_per_feed = 100 no_stylesheets = True remove_javascript = True use_embedded_content = False remove_empty_feeds = True ignore_duplicate_articles = {'title', 'url'} feeds = [ ('Politik', 'http://rss.focus.de/politik/'), ('Finanzen', 'http://rss.focus.de/finanzen/'), ('Gesundheit', 'http://rss.focus.de/gesundheit/'), ('Panorama', 'http://rss.focus.de/panorama/'), ('Digital', 'http://rss.focus.de/digital/'), ('Reisen', 'http://rss.focus.de/reisen/') ] keep_only_tags = [ dict(name='div', attrs={'id':'article'}) ] remove_tags = [ dict(name='div', attrs={'class':['inimagebuttons', 'kolumneHead clearfix']}) ] remove_attributes = ['width', 'height'] extra_css = 'h1 {font-size: 1.6em; text-align: left; margin-top: 0em} \ h2 {font-size: 1em; text-align: left} \ .overhead {margin-bottom: 0em} \ .caption {font-size: 0.6em}' def print_version(self, url): return url + '?drucken=1' def preprocess_html(self, soup): # remove useless references to videos for item in soup.findAll('h2'): if item.string: txt = item.string.upper() if txt.startswith('IM VIDEO:') or txt.startswith('VIDEO:'): item.extract() return soup |
Tags |
recipe |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Still losing focus | JKenP | Calibre | 4 | 05-27-2011 08:17 AM |
Focus on First Wave of E-book Marketing | DMcCunney | News | 6 | 12-18-2010 07:45 PM |
Focus annoyance | edbro | Calibre | 2 | 10-05-2010 06:07 PM |
Focus not properly shifting on links | JSWolf | Feedback | 9 | 08-14-2010 06:12 PM |
Focus the reply message bo | kovidgoyal | Feedback | 9 | 02-11-2009 03:30 AM |