![]() |
#1 |
Developer
![]() ![]() ![]() Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
Enhanced brand eins recipe
Hi,
I took the liberty to enhance the existing brand eins recipe. Here is my changelog: NEW: The issue to download can be selected via the username field. NEW: Add cover image. NEW: Prevent that conversion date is appended to title. NEW: Remove "This article was downloaded by calibre from..." section from bottom of each page. FIXED: "brand eins" is written in lowercase. And here is the recipe: Code:
#!/usr/bin/env python # -*- coding: utf-8 mode: python -*- __license__ = 'GPL v3' __copyright__ = '2010, Constantin Hofstetter <consti at consti.de>, Steffen Siebert <calibre at steffensiebert.de>' __version__ = '0.96' ''' http://brandeins.de - Wirtschaftsmagazin ''' import re import string from calibre.web.feeds.recipes import BasicNewsRecipe from calibre.web.feeds.templates import Template, CLASS from lxml.html.builder import HTML, HEAD, TITLE, STYLE, DIV, BODY, BR, A, HR, UL class MyNavBarTemplate(Template): """ Same as calibre.web.feeds.templates.NavBarTemplate but without the 'This article was downloaded by calibre from...' text at the bottom. """ def _generate(self, bottom, feed, art, number_of_articles_in_feed, two_levels, url, __appname__, prefix='', center=True, extra_css=None, style=None): head = HEAD(TITLE('navbar')) if style: head.append(STYLE(style, type='text/css')) if extra_css: head.append(STYLE(extra_css, type='text/css')) if prefix and not prefix.endswith('/'): prefix += '/' align = 'center' if center else 'left' navbar = DIV(CLASS('calibre_navbar', 'calibre_rescale_70', style='text-align:'+align)) if bottom: if not url.startswith('file://'): navbar.append(HR()) else: next = 'feed_%d'%(feed+1) if art == number_of_articles_in_feed - 1 \ else 'article_%d'%(art+1) up = '../..' if art == number_of_articles_in_feed - 1 else '..' href = '%s%s/%s/index.html'%(prefix, up, next) navbar.text = '| ' navbar.append(A('Next', href=href)) href = '%s../index.html#article_%d'%(prefix, art) navbar.iterchildren(reversed=True).next().tail = ' | ' navbar.append(A('Section Menu', href=href)) href = '%s../../index.html#feed_%d'%(prefix, feed) navbar.iterchildren(reversed=True).next().tail = ' | ' navbar.append(A('Main Menu', href=href)) if art > 0 and not bottom: href = '%s../article_%d/index.html'%(prefix, art-1) navbar.iterchildren(reversed=True).next().tail = ' | ' navbar.append(A('Previous', href=href)) navbar.iterchildren(reversed=True).next().tail = ' | ' if not bottom: navbar.append(HR()) self.root = HTML(head, BODY(navbar)) class BrandEins(BasicNewsRecipe): title = u'brand eins' __author__ = 'Constantin Hofstetter' description = u'Wirtschaftsmagazin' publisher ='brandeins.de' category = 'politics, business, wirtschaft, Germany' use_embedded_content = False lang = 'de-DE' no_stylesheets = True encoding = 'utf-8' language = 'de' publication_type = 'magazine' needs_subscription = True # Prevent that conversion date is appended to title timefmt = '' # 2 is the last full magazine (default) # 1 is the newest (but not full) # 3 is one before 2 etc. # This value can be set via the username field. default_issue = 2 keep_only_tags = [dict(name='div', attrs={'id':'theContent'}), dict(name='div', attrs={'id':'sidebar'}), dict(name='div', attrs={'class':'intro'}), dict(name='p', attrs={'class':'bodytext'}), dict(name='div', attrs={'class':'single_image'})] ''' brandeins.de ''' def __init__(self, options, log, progress_reporter): """ Constructor. """ BasicNewsRecipe.__init__(self, options, log, progress_reporter) self.navbar = MyNavBarTemplate() def postprocess_html(self, soup,first): # Move the image of the sidebar right below the h3 first_h3 = soup.find(name='div', attrs={'id':'theContent'}).find('h3') for imgdiv in soup.findAll(name='div', attrs={'class':'single_image'}): if len(first_h3.findNextSiblings('div', {'class':'intro'})) >= 1: # first_h3.parent.insert(2, imgdiv) first_h3.findNextSiblings('div', {'class':'intro'})[0].parent.insert(4, imgdiv) else: first_h3.parent.insert(2, imgdiv) # Now, remove the sidebar soup.find(name='div', attrs={'id':'sidebar'}).extract() # Remove the rating-image (stars) from the h3 for img in first_h3.findAll(name='img'): img.extract() # Mark the intro texts as italic for div in soup.findAll(name='div', attrs={'class':'intro'}): for p in div.findAll('p'): content = self.tag_to_string(p) new_p = "<p><i>"+ content +"</i></p>" p.replaceWith(new_p) return soup def get_cover(self, soup): cover_url = None cover_item = soup.find('div', attrs = {'class': 'cover_image'}) if cover_item: cover_url = 'http://www.brandeins.de/' + cover_item.img['src'] return cover_url def parse_index(self): feeds = [] archive = "http://www.brandeins.de/archiv.html" issue = self.default_issue if self.username: try: issue = int(self.username) except: pass soup = self.index_to_soup(archive) latest_jahrgang = soup.findAll('div', attrs={'class': re.compile(r'\bjahrgang-latest\b') })[0].findAll('ul')[0] pre_latest_issue = latest_jahrgang.findAll('a')[len(latest_jahrgang.findAll('a'))-issue] url = pre_latest_issue.get('href', False) # Get the title for the magazin - build it out of the title of the cover - take the issue and year; self.title = "brand eins "+ re.search(r"(?P<date>\d\d\/\d\d\d\d)", pre_latest_issue.find('img').get('title', False)).group('date') url = 'http://brandeins.de/'+url # url = "http://www.brandeins.de/archiv/magazin/tierisch.html" titles_and_articles = self.brand_eins_parse_latest_issue(url) if titles_and_articles: for title, articles in titles_and_articles: feeds.append((title, articles)) return feeds def brand_eins_parse_latest_issue(self, url): soup = self.index_to_soup(url) self.cover_url = self.get_cover(soup) article_lists = [soup.find('div', attrs={'class':'subColumnLeft articleList'}), soup.find('div', attrs={'class':'subColumnRight articleList'})] titles_and_articles = [] current_articles = [] chapter_title = "Editorial" self.log('Found Chapter:', chapter_title) # Remove last list of links (thats just the impressum and the 'gewinnspiel') article_lists[1].findAll('ul')[len(article_lists[1].findAll('ul'))-1].extract() for article_list in article_lists: for chapter in article_list.findAll('ul'): if len(chapter.findPreviousSiblings('h3')) >= 1: new_chapter_title = string.capwords(self.tag_to_string(chapter.findPreviousSiblings('h3')[0])) if new_chapter_title != chapter_title: titles_and_articles.append([chapter_title, current_articles]) current_articles = [] self.log('Found Chapter:', new_chapter_title) chapter_title = new_chapter_title for li in chapter.findAll('li'): a = li.find('a', href = True) if a is None: continue title = self.tag_to_string(a) url = a.get('href', False) if not url or not title: continue url = 'http://brandeins.de/'+url if len(a.parent.findNextSiblings('p')) >= 1: description = self.tag_to_string(a.parent.findNextSiblings('p')[0]) else: description = '' self.log('\t\tFound article:', title) self.log('\t\t\t', url) self.log('\t\t\t', description) current_articles.append({'title': title, 'url': url, 'description': description, 'date':''}) titles_and_articles.append([chapter_title, current_articles]) return titles_and_articles Steffen |
![]() |
![]() |
![]() |
#2 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Sep 2010
Device: Kindle
|
Hi Steffen!
Thanks for the Info - I've pushed your changes into the Repository. @all: The newest version of the script can be found here (including Steffens changes!): https://github.com/consti/BrandEins-...andeins.recipe |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I haven't looked at your site or recipe, but you should be aware that this feature is used by many people who have readers that can access the web. Removing it often decreases the value of a recipe.
|
![]() |
![]() |
![]() |
#4 | |
Developer
![]() ![]() ![]() Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
Quote:
Perhaps it makes more sense for other sites, but the brand eins recipe fetches the monthly published print magazine from the web online archive and the EPUB contains all the relevant content of the web pages, so I see no point in having a link on every single page and I doubt that brand eins would have them if they would provide an EPUB version of their magazine (which they currently don't) . I would prefer to have a single notice with link at the beginning and/or the end of the EPUB file to give credit to calibre and refer to the source; so it would be perfect if a recipe could easily switch between "link on every page" and "link at beginning and end of EPUB" behavior. Ciao, Steffen Last edited by siebert; 11-21-2010 at 11:28 AM. |
|
![]() |
![]() |
![]() |
#5 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
2) In addition to removing advertisements, many recipes remove related links. I remove them when I write a recipe, but I may want to look at them for some articles. 3) I'm not always connected to the web. |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 | ||
Developer
![]() ![]() ![]() Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
Quote:
Apart from that I would consider the removal of ads as a feature. Quote:
Fortunatly it's rather easy, as all content of back issues is available as html pages in their online archive. The EPUB should be self contained, having all relevant content of the web pages (which hopefully have all relevant content of the printed magazine) included in the EPUB. What the recipe is removing is just the web framework for navigation etc. which is shown on the brand eins webpage, but not in the printed magazine, so it's neither necessary nor wanted in the EPUB either. If everything interesting is included in the EPUB, there is no point having a link to the source webpage, as I wouldn't follow it because there is nothing to gain. Of course there should be some credit to calibre included in the generated EPUB plus a link to the index web page we used to fetch the content, but this should be included only once at the beginning and/or the end of the EPUB , not on every single page. Ciao, Steffen |
||
![]() |
![]() |
![]() |
#7 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I won't try to convince you of my viewpoint, if you'll grant me the same. The issue isn't what you or I think is best, it's what is consistent and expected by other recipe users who run a Calibre builtin recipe. We can always customize the recipe to any result we like, and we can offer that customization to others by including the needed code and a note in the description/recipe comments of how to use it.
|
![]() |
![]() |
![]() |
#8 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Sep 2010
Device: Kindle
|
I've reverted Steffens changes until further notice.
I have to look in the changes.. sorry for including them so fast. I am in Beijing right now, so I'll look into it as soon as I am back home. @steffen: Sorry for reverting the changes. Lets talk about it as soon as I am back (should be in one week or so ![]() |
![]() |
![]() |
![]() |
#9 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Apr 2011
Device: PRS-650B
|
Hi Steffen and Consti,
thanks for putting this recipe together. Unfortunately, I am facing problems using it. Whenever I try to pull the articles from the website using Calibre, I get the following error log. Can you advise? File "site-packages\calibre\web\feeds\news.py", line 872, in build_index File "c:\users\f\appdata\local\temp\calibre_0.7.53_tmp_ uy9v4j\calibre_0.7.53_qykifp_recipes\recipe0.py", line 103, in parse_index issue_list = soup.findAll('div', attrs={'class': 'tx-brandeinsmagazine-pi1'})[0].findAll('a') IndexError: list index out of range Thanks and Best Regards! |
![]() |
![]() |
![]() |
#10 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Sep 2010
Device: Kindle
|
Hello FritziFratz!
I'll take a look into the BrandEins Recipe tomorrow/today (this Sunday ![]() I've not checked the recipe for a long time - the source is available here: https://github.com/consti/BrandEins-Recipe https://github.com/consti/BrandEins-...andeins.recipe I'll let you know what my findings were - -- Consti Last edited by Consti; 04-09-2011 at 07:23 PM. |
![]() |
![]() |
![]() |
#11 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Sep 2010
Device: Kindle
|
I just managed to find time to test the BrandEins Recipe:
It works for me. Maybe the problem was that there wasn't a previous issue available (the current issue is only partially available, per default we select the previous issue. but if that is not available (e.g., it's january) it might break. I've now (again, sorry for keeping you waiting, Steffen!) officially included his changes in the Recipe. I can live without the links at the bottom of each page (I've never noticed them on the Kindle-formatted ebooks anyway). Thanks for your contributions (@Steffen), they really made the whole recipe a lot better! @FritziFratz Let me know if the recipe works for you now. I am using the latest version of calibre and the version of the recipe bundled with it. |
![]() |
![]() |
![]() |
#12 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Apr 2011
Device: PRS-650B
|
Hi Consti,
thanks a lot for checking. In parallel to this thread, I also posted my question in another thread of this forum. Steffen already helped me and the issue is resolved. The problem was a setting of my desktop firewall :-( Sorry that I bugged you with this. See: https://www.mobileread.com/forums/sho...d.php?t=114128 Thanks for your work on putting this recipe together and your quick reply, Consti. Have a great sunday! |
![]() |
![]() |
![]() |
#13 | |
Developer
![]() ![]() ![]() Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
Quote:
Ciao, Steffen |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Recipe works when mocked up as Python file, fails when converted to Recipe | ode | Recipes | 7 | 09-04-2011 04:57 AM |
Enhanced Photography ebook | Andrew Brooks | Self-Promotions by Authors and Publishers | 0 | 11-04-2010 06:12 AM |
Enhanced Firmware for V3 | keng2000 | HanLin eBook | 12 | 04-12-2010 09:30 AM |
Enhanced Editions | charleski | News | 9 | 02-24-2010 10:07 AM |
Enhanced Editions | STML | News | 14 | 09-10-2009 08:51 PM |