Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-21-2010, 08:37 AM   #1
siebert
Developer
siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.
 
Posts: 137
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 3 WiFi / Nexus 4 (Android)
Enhanced brand eins recipe

Hi,

I took the liberty to enhance the existing brand eins recipe.

Here is my changelog:
NEW: The issue to download can be selected via the username field.
NEW: Add cover image.
NEW: Prevent that conversion date is appended to title.
NEW: Remove "This article was downloaded by calibre from..." section from bottom of each page.
FIXED: "brand eins" is written in lowercase.

And here is the recipe:
Code:
#!/usr/bin/env  python
# -*- coding: utf-8 mode: python -*-

__license__   = 'GPL v3'
__copyright__ = '2010, Constantin Hofstetter <consti at consti.de>, Steffen Siebert <calibre at steffensiebert.de>'
__version__   = '0.96'

''' http://brandeins.de - Wirtschaftsmagazin '''
import re
import string
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.web.feeds.templates import Template, CLASS
from lxml.html.builder import HTML, HEAD, TITLE, STYLE, DIV, BODY, BR, A, HR, UL

class MyNavBarTemplate(Template):
  """
  Same as calibre.web.feeds.templates.NavBarTemplate but without the
  'This article was downloaded by calibre from...'
  text at the bottom.
  """

  def _generate(self, bottom, feed, art, number_of_articles_in_feed,
                two_levels, url, __appname__, prefix='', center=True,
                extra_css=None, style=None):
    head = HEAD(TITLE('navbar'))
    if style:
      head.append(STYLE(style, type='text/css'))
    if extra_css:
      head.append(STYLE(extra_css, type='text/css'))

    if prefix and not prefix.endswith('/'):
      prefix += '/'
    align = 'center' if center else 'left'

    navbar = DIV(CLASS('calibre_navbar', 'calibre_rescale_70',
                       style='text-align:'+align))
    if bottom:
      if not url.startswith('file://'):
        navbar.append(HR())
    else:
      next = 'feed_%d'%(feed+1) if art == number_of_articles_in_feed - 1 \
          else 'article_%d'%(art+1)
      up = '../..' if art == number_of_articles_in_feed - 1 else '..'
      href = '%s%s/%s/index.html'%(prefix, up, next)
      navbar.text = '| '
      navbar.append(A('Next', href=href))
    href = '%s../index.html#article_%d'%(prefix, art)
    navbar.iterchildren(reversed=True).next().tail = ' | '
    navbar.append(A('Section Menu', href=href))
    href = '%s../../index.html#feed_%d'%(prefix, feed)
    navbar.iterchildren(reversed=True).next().tail = ' | '
    navbar.append(A('Main Menu', href=href))
    if art > 0 and not bottom:
      href = '%s../article_%d/index.html'%(prefix, art-1)
      navbar.iterchildren(reversed=True).next().tail = ' | '
      navbar.append(A('Previous', href=href))
    navbar.iterchildren(reversed=True).next().tail = ' | '
    if not bottom:
      navbar.append(HR())

    self.root = HTML(head, BODY(navbar))

class BrandEins(BasicNewsRecipe):

  title = u'brand eins'
  __author__ = 'Constantin Hofstetter'
  description = u'Wirtschaftsmagazin'
  publisher ='brandeins.de'
  category = 'politics, business, wirtschaft, Germany'
  use_embedded_content = False
  lang = 'de-DE'
  no_stylesheets = True
  encoding = 'utf-8'
  language = 'de'
  publication_type = 'magazine'
  needs_subscription = True
  # Prevent that conversion date is appended to title
  timefmt = ''

  # 2 is the last full magazine (default)
  # 1 is the newest (but not full)
  # 3 is one before 2 etc.
  # This value can be set via the username field.
  default_issue = 2

  keep_only_tags = [dict(name='div', attrs={'id':'theContent'}), dict(name='div', attrs={'id':'sidebar'}), dict(name='div', attrs={'class':'intro'}), dict(name='p', attrs={'class':'bodytext'}), dict(name='div', attrs={'class':'single_image'})]

  '''
  brandeins.de
  '''

  def __init__(self, options, log, progress_reporter):
    """ Constructor. """
    BasicNewsRecipe.__init__(self, options, log, progress_reporter)
    self.navbar = MyNavBarTemplate()
  
  def postprocess_html(self, soup,first):

    # Move the image of the sidebar right below the h3
    first_h3 = soup.find(name='div', attrs={'id':'theContent'}).find('h3')
    for imgdiv in soup.findAll(name='div', attrs={'class':'single_image'}):
      if len(first_h3.findNextSiblings('div', {'class':'intro'})) >= 1:
        # first_h3.parent.insert(2, imgdiv)
        first_h3.findNextSiblings('div', {'class':'intro'})[0].parent.insert(4, imgdiv)
      else:
        first_h3.parent.insert(2, imgdiv)

    # Now, remove the sidebar
    soup.find(name='div', attrs={'id':'sidebar'}).extract()

    # Remove the rating-image (stars) from the h3
    for img in first_h3.findAll(name='img'):
        img.extract()

    # Mark the intro texts as italic
    for div in soup.findAll(name='div', attrs={'class':'intro'}):
      for p in div.findAll('p'):
        content = self.tag_to_string(p)
        new_p = "<p><i>"+ content +"</i></p>"
        p.replaceWith(new_p)

    return soup

  def get_cover(self, soup):
    cover_url = None
    cover_item = soup.find('div', attrs = {'class': 'cover_image'})
    if cover_item:
      cover_url = 'http://www.brandeins.de/' + cover_item.img['src']
    return cover_url

  def parse_index(self):
    feeds = []

    archive = "http://www.brandeins.de/archiv.html"

    issue = self.default_issue
    if self.username:
      try:
        issue = int(self.username)
      except:
        pass

    soup = self.index_to_soup(archive)
    latest_jahrgang = soup.findAll('div', attrs={'class': re.compile(r'\bjahrgang-latest\b') })[0].findAll('ul')[0]
    pre_latest_issue = latest_jahrgang.findAll('a')[len(latest_jahrgang.findAll('a'))-issue]
    url = pre_latest_issue.get('href', False)
    # Get the title for the magazin - build it out of the title of the cover - take the issue and year;
    self.title = "brand eins "+ re.search(r"(?P<date>\d\d\/\d\d\d\d)", pre_latest_issue.find('img').get('title', False)).group('date')
    url = 'http://brandeins.de/'+url

    # url = "http://www.brandeins.de/archiv/magazin/tierisch.html"
    titles_and_articles = self.brand_eins_parse_latest_issue(url)
    if titles_and_articles:
      for title, articles in titles_and_articles:
        feeds.append((title, articles))
    return feeds

  def brand_eins_parse_latest_issue(self, url):
    soup = self.index_to_soup(url)
    self.cover_url = self.get_cover(soup)
    article_lists = [soup.find('div', attrs={'class':'subColumnLeft articleList'}), soup.find('div', attrs={'class':'subColumnRight articleList'})]

    titles_and_articles = []
    current_articles = []
    chapter_title = "Editorial"
    self.log('Found Chapter:', chapter_title)

    # Remove last list of links (thats just the impressum and the 'gewinnspiel')
    article_lists[1].findAll('ul')[len(article_lists[1].findAll('ul'))-1].extract()

    for article_list in article_lists:
      for chapter in article_list.findAll('ul'):
        if len(chapter.findPreviousSiblings('h3')) >= 1:
          new_chapter_title = string.capwords(self.tag_to_string(chapter.findPreviousSiblings('h3')[0]))
          if new_chapter_title != chapter_title:
            titles_and_articles.append([chapter_title, current_articles])
            current_articles = []
            self.log('Found Chapter:', new_chapter_title)
          chapter_title = new_chapter_title
        for li in chapter.findAll('li'):
          a = li.find('a', href = True)
          if a is None:
            continue
          title = self.tag_to_string(a)
          url = a.get('href', False)
          if not url or not title:
            continue
          url = 'http://brandeins.de/'+url
          if len(a.parent.findNextSiblings('p')) >= 1:
            description = self.tag_to_string(a.parent.findNextSiblings('p')[0])
          else:
            description = ''

          self.log('\t\tFound article:', title)
          self.log('\t\t\t', url)
          self.log('\t\t\t', description)

          current_articles.append({'title': title, 'url': url, 'description': description, 'date':''})
    titles_and_articles.append([chapter_title, current_articles])
    return titles_and_articles
Ciao,
Steffen
siebert is offline   Reply With Quote
Old 11-21-2010, 09:22 AM   #2
Consti
Junior Member
Consti began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Sep 2010
Device: Kindle
Hi Steffen!

Thanks for the Info - I've pushed your changes into the Repository.

@all: The newest version of the script can be found here (including Steffens changes!):
https://github.com/consti/BrandEins-...andeins.recipe
Consti is offline   Reply With Quote
Old 11-21-2010, 11:24 AM   #3
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by siebert View Post
NEW: Remove "This article was downloaded by calibre from..." section from bottom of each page.
I haven't looked at your site or recipe, but you should be aware that this feature is used by many people who have readers that can access the web. Removing it often decreases the value of a recipe.
Starson17 is offline   Reply With Quote
Old 11-21-2010, 12:22 PM   #4
siebert
Developer
siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.
 
Posts: 137
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 3 WiFi / Nexus 4 (Android)
Quote:
Originally Posted by Starson17 View Post
I haven't looked at your site or recipe, but you should be aware that this feature is used by many people who have readers that can access the web. Removing it often decreases the value of a recipe.
I don't get why you bother to create an offline copy of the content via calibre if you want to read it online via a browser?

Perhaps it makes more sense for other sites, but the brand eins recipe fetches the monthly published print magazine from the web online archive and the EPUB contains all the relevant content of the web pages, so I see no point in having a link on every single page and I doubt that brand eins would have them if they would provide an EPUB version of their magazine (which they currently don't) .

I would prefer to have a single notice with link at the beginning and/or the end of the EPUB file to give credit to calibre and refer to the source; so it would be perfect if a recipe could easily switch between "link on every page" and "link at beginning and end of EPUB" behavior.

Ciao,
Steffen

Last edited by siebert; 11-21-2010 at 12:28 PM.
siebert is offline   Reply With Quote
Old 11-21-2010, 02:02 PM   #5
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by siebert View Post
I don't get why you bother to create an offline copy of the content via calibre if you want to read it online via a browser?
1) You do realize that the recipe removes advertisements and other less relevant content, don't you?
2) In addition to removing advertisements, many recipes remove related links. I remove them when I write a recipe, but I may want to look at them for some articles.
3) I'm not always connected to the web.
Starson17 is offline   Reply With Quote
Old 11-22-2010, 05:20 AM   #6
siebert
Developer
siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.
 
Posts: 137
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 3 WiFi / Nexus 4 (Android)
Quote:
Originally Posted by Starson17 View Post
1) You do realize that the recipe removes advertisements and other less relevant content, don't you?
I don't remember seeing any ads in the brand eins archive, but it's possible that Adblock plus just hides them from me.

Apart from that I would consider the removal of ads as a feature.

Quote:
2) In addition to removing advertisements, many recipes remove related links. I remove them when I write a recipe, but I may want to look at them for some articles.
My goal for the brand eins recipe is to create a substitute for the official EPUB version of the brand eins magazine, which doesn't exist yet (they only sell the printed magazine).

Fortunatly it's rather easy, as all content of back issues is available as html pages in their online archive.

The EPUB should be self contained, having all relevant content of the web pages (which hopefully have all relevant content of the printed magazine) included in the EPUB. What the recipe is removing is just the web framework for navigation etc. which is shown on the brand eins webpage, but not in the printed magazine, so it's neither necessary nor wanted in the EPUB either.

If everything interesting is included in the EPUB, there is no point having a link to the source webpage, as I wouldn't follow it because there is nothing to gain.

Of course there should be some credit to calibre included in the generated EPUB plus a link to the index web page we used to fetch the content, but this should be included only once at the beginning and/or the end of the EPUB , not on every single page.

Ciao,
Steffen
siebert is offline   Reply With Quote
Old 11-22-2010, 12:45 PM   #7
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by siebert View Post
If everything interesting is included in the EPUB, there is no point having a link to the source webpage, as I wouldn't follow it because there is nothing to gain.
I won't try to convince you of my viewpoint, if you'll grant me the same. The issue isn't what you or I think is best, it's what is consistent and expected by other recipe users who run a Calibre builtin recipe. We can always customize the recipe to any result we like, and we can offer that customization to others by including the needed code and a note in the description/recipe comments of how to use it.
Starson17 is offline   Reply With Quote
Old 11-26-2010, 01:56 AM   #8
Consti
Junior Member
Consti began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Sep 2010
Device: Kindle
I've reverted Steffens changes until further notice.
I have to look in the changes.. sorry for including them so fast.

I am in Beijing right now, so I'll look into it as soon as I am back home.

@steffen: Sorry for reverting the changes. Lets talk about it as soon as I am back (should be in one week or so )
Consti is offline   Reply With Quote
Old 04-07-2011, 03:39 PM   #9
fritzifratz
Junior Member
fritzifratz began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Apr 2011
Device: PRS-650B
Hi Steffen and Consti,

thanks for putting this recipe together. Unfortunately, I am facing problems using it. Whenever I try to pull the articles from the website using Calibre, I get the following error log. Can you advise?

File "site-packages\calibre\web\feeds\news.py", line 872, in build_index
File "c:\users\f\appdata\local\temp\calibre_0.7.53_tmp_ uy9v4j\calibre_0.7.53_qykifp_recipes\recipe0.py", line 103, in parse_index
issue_list = soup.findAll('div', attrs={'class': 'tx-brandeinsmagazine-pi1'})[0].findAll('a')
IndexError: list index out of range

Thanks and Best Regards!
fritzifratz is offline   Reply With Quote
Old 04-09-2011, 07:38 PM   #10
Consti
Junior Member
Consti began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Sep 2010
Device: Kindle
Hello FritziFratz!

I'll take a look into the BrandEins Recipe tomorrow/today (this Sunday ).
I've not checked the recipe for a long time -
the source is available here:
https://github.com/consti/BrandEins-Recipe

https://github.com/consti/BrandEins-...andeins.recipe

I'll let you know what my findings were -

--
Consti

Last edited by Consti; 04-09-2011 at 08:23 PM.
Consti is offline   Reply With Quote
Old 04-09-2011, 07:58 PM   #11
Consti
Junior Member
Consti began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Sep 2010
Device: Kindle
I just managed to find time to test the BrandEins Recipe:
It works for me. Maybe the problem was that there wasn't a previous issue available (the current issue is only partially available, per default we select the previous issue. but if that is not available (e.g., it's january) it might break.

I've now (again, sorry for keeping you waiting, Steffen!) officially included his changes in the Recipe. I can live without the links at the bottom of each page (I've never noticed them on the Kindle-formatted ebooks anyway).

Thanks for your contributions (@Steffen), they really made the whole recipe a lot better!

@FritziFratz Let me know if the recipe works for you now. I am using the latest version of calibre and the version of the recipe bundled with it.
Consti is offline   Reply With Quote
Old 04-10-2011, 05:45 AM   #12
fritzifratz
Junior Member
fritzifratz began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Apr 2011
Device: PRS-650B
Hi Consti,

thanks a lot for checking. In parallel to this thread, I also posted my question in another thread of this forum. Steffen already helped me and the issue is resolved. The problem was a setting of my desktop firewall :-( Sorry that I bugged you with this.

See: http://www.mobileread.com/forums/sho...d.php?t=114128

Thanks for your work on putting this recipe together and your quick reply, Consti. Have a great sunday!
fritzifratz is offline   Reply With Quote
Old 04-11-2011, 05:58 AM   #13
siebert
Developer
siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.
 
Posts: 137
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 3 WiFi / Nexus 4 (Android)
Quote:
Originally Posted by Consti View Post
Maybe the problem was that there wasn't a previous issue available (the current issue is only partially available, per default we select the previous issue. but if that is not available (e.g., it's january) it might break.
This error was already fixed by me in the official calibre brand-eins recipe, see commit 7415: http://bazaar.launchpad.net/~kovid/c.../revision/7415

Ciao,
Steffen
siebert is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Recipe works when mocked up as Python file, fails when converted to Recipe ode Recipes 7 09-04-2011 05:57 AM
Enhanced Photography ebook Andrew Brooks Self-Promotions by Authors and Publishers 0 11-04-2010 07:12 AM
Enhanced Firmware for V3 keng2000 HanLin eBook 12 04-12-2010 10:30 AM
Enhanced Editions charleski News 9 02-24-2010 11:07 AM
Enhanced Editions STML News 14 09-10-2009 09:51 PM


All times are GMT -4. The time now is 11:23 AM.


MobileRead.com is a privately owned, operated and funded community.