Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-03-2011, 08:08 AM   #1
achims
Member
achims began at the beginning.
 
Posts: 24
Karma: 12
Join Date: Oct 2011
Device: Xperia Active, Iconia A500, Galaxy I5500
New Recipe for ZEIT Premium download ALL

I have created a new recipe for ZEIT Premium (subscription only).

It downloads all E-Books the page has to offer:

- The main newspaper Die Zeit in all offered formats (epub, mobi, pdf, and a zip with all audiobooks of the newspaper). All formats are imported into calibre db as one logical book entry.

- Zeit Magazin (pdf) imported in its own new book entry.

The user can easily switch on/off the different formats to download at the recipe's header.

I think this is the first recipe to download pdf's etc, so it might be interesting for other recipe developers, too.

Code:
import re, zipfile, os
from calibre.ptempfile import PersistentTemporaryDirectory
from calibre.ptempfile import PersistentTemporaryFile
from urlparse import urlparse

GET_MOBI=False
GET_PDF=True
GET_AUDIO=True
GET_MAGAZIN=True

class ZeitPremiumAllFormats(BasicNewsRecipe):
    title          = u'Zeit Premium All Formats'
    description    = u'Lädt alle angebotenen E-Book Formate der aktuellen Woche aus dem Zeit Premium Bereich (kostenpflichtiges Abo). Dies beinhaltet für Die Zeit die Formate epub, mobi, pdf und alle Audiofiles als zip. Sie werden in der Calibre Datenbank als ein einziges Buch eingetragen. Des weiteren das Zeit Magazin als pdf als eigenständiges Buch. Aus technischen Gründen wird ein dritter Bucheintrag erstellt, der Die Zeit in einer abgewandelten epub Version erhält. Dieser Eintrag kann getrost gelöscht werden. Alle Formate ausser epub können ein- oder ausgeschaltet werden. Anmerkung: Während der Umstellung auf eine neue Ausgabe (Mittwoch abends) werden nicht alle Formate gleichzeitig erneuert. Im Calibre Eintrag können dann die verschiedenen Formate zu verschiedenen Ausgaben gehören! ___Getestet unter Unix___ - unter anderen Betriebssystemen funktioniert dieses recipe möglicherweise nicht.'
    __author__ = 'Achim Schumacher'
    language = 'de'
    needs_subscription = True
    conversion_options = {
        'no_default_epub_cover' : True,
    }

    #
    # Login process required:
    # Override BasicNewsRecipe.get_browser()
    #
    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        # new login process
        domain = "https://premium.zeit.de"
        response = br.open(domain)
        # Get rid of nested form
        response.set_data(re.sub('<div><form action=.*', '', response.get_data() ))
        br.set_response(response)
        br.select_form(nr=2)
        br.form['name']=self.username
        br.form['pass']=self.password
        br.submit()
        return br


    # Do not fetch news and convert them to E-Books.
    # Instead, download the epub directly from the site.
    # For this, override BasicNewsRecipe.build_index()
    #
    def build_index(self):
        browser = self.get_browser()

        # find the links
        epublink = browser.find_link(text_regex=re.compile('.*Ausgabe als Datei im ePub-Format.*'))
        mobilink = browser.find_link(text_regex=re.compile('.*Ausgabe als Datei im Mobi-Format.*'))
        pdflink = browser.find_link(text_regex=re.compile('.*Download der gesamten Ausgabe als PDF Datei.*'))
        audiolink = browser.find_link(text_regex=re.compile('.*Alle Audios der aktuellen ZEIT.*'))
        edition = (urlparse(pdflink.url)[2]).replace('/system/files/epaper/DZ/pdf/DZ_ePaper_','').replace('.pdf','')
        zm_url = urlparse(pdflink.base_url)[0]+'://'+urlparse(pdflink.base_url)[1]+''+(urlparse(pdflink.url)[2]).replace('DZ/pdf/DZ_ePaper','ZM/pdf/ZM_ePaper')
        # TODO: Test for other books that are only published once in a while
        #       (e.g., Die Zeit Beilage)

        print "Found epub-link: %s" % epublink.url
        print "Found Mobi-link: %s" % mobilink.url
        print "Found pdf-link: %s" % pdflink.url
        print "Found audio-link: %s" % audiolink.url
        print "Found ZM-link: %s" % zm_url
        print "This edition is: %s" % edition

        # The following part is from a recipe by Starsom17
        #
        # It modifies build_index, which is the method that gets the 
        # masthead image and cover, parses the feed for articles, retrieves
        # the articles, removes tags from articles, etc. All of those steps 
        # ultimately produce a local directory structure that looks like an 
        # unzipped EPUB. 
        #
        # This part grabs the link to one EPUB, saves the EPUB locally,
        # extracts it, and passes the result back into the recipe system
        # as though all the other steps had been completed normally.
        #
        # This has to be done, even if one does not want to use this
        # calibre-modified epub. Otherwise, the recipe runs into an error.
        # This is the reason why there shows up a second Die Zeit entry
        # in calibre db.
        self.report_progress(0,_('downloading epub'))
        response = browser.follow_link(epublink)
        # We need two different directories for Die Zeit and Zeit Magazin
        DZdir = PersistentTemporaryDirectory()
        ZMdir = PersistentTemporaryDirectory()
        epub_file = PersistentTemporaryFile(suffix='.epub',dir=DZdir)
        epub_file.write(response.read())
        epub_file.close()
        zfile = zipfile.ZipFile(epub_file.name, 'r')
        self.report_progress(0.1,_('extracting epub'))
        zfile.extractall(self.output_dir)
        epub_file.close()
        index = os.path.join(self.output_dir, 'content.opf')
        self.report_progress(0.2,_('epub downloaded and extracted'))

        #
        # Now, download the remaining files
        #
        print "output_dir is: %s" % self.output_dir
        print "DZdir is: %s" % DZdir
        print "ZMdir is: %s" % ZMdir

        if (GET_MOBI):
           self.report_progress(0.3,_('downloading mobi'))
           mobi_file = PersistentTemporaryFile(suffix='.mobi',dir=DZdir)
           browser.back()
           response = browser.follow_link(mobilink)
           mobi_file.write(response.read())
           mobi_file.close()

        if (GET_PDF):
           self.report_progress(0.4,_('downloading pdf'))
           pdf_file = PersistentTemporaryFile(suffix='.pdf',dir=DZdir)
           browser.back()
           response = browser.follow_link(pdflink)
           pdf_file.write(response.read())
           pdf_file.close()

        if (GET_AUDIO):
           self.report_progress(0.5,_('downloading audio'))
           audio_file = PersistentTemporaryFile(suffix='.mp3.zip',dir=DZdir)
           browser.back()
           response = browser.follow_link(audiolink)
           audio_file.write(response.read())
           audio_file.close()

        # Get all Die Zeit formats into Calibre's database
        self.report_progress(0.6,_('Adding Die Zeit to Calibre db'))
        cmd = "calibredb add -1 " + DZdir
        os.system(cmd)

        # Zeit Magazin has to be handled differently.
        # First, it has to be downloaded into it's own directory, since it
        # is a different book as Die Zeit.
        # Second, we know its url rather than its link.
        # Third, there is no Metadata present, so we need to give it
        # a proper name so that calibre will set Author and Title at import.
        # Unfortunately, the present solution includes a random part in the
        # name which after db import has to be manually resolved by the user.
        if (GET_MAGAZIN):
           self.report_progress(0.7,_('downloading ZM'))
           ZM_file = PersistentTemporaryFile(suffix='  Zeit Magazin '+edition+' - Zeitverlag Gerd Bucerius GmbH und Co. KG.pdf',dir=ZMdir)
           response = browser.open(zm_url)
           ZM_file.write(response.read())
           ZM_file.close()
           # Get Zeit Magazin into Calibre's database
           self.report_progress(0.8,_('Adding Zeit Magazin to Calibre db'))
           cmd = "calibredb add -1 " + ZMdir
           os.system(cmd)

        return index
achims is offline   Reply With Quote
Old 11-03-2011, 10:25 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by achims View Post
I think this is the first recipe to download pdf's etc, so it might be interesting for other recipe developers, too.
It looks interesting and useful to me.
Why don't you post it in the sticky code section?
Starson17 is offline   Reply With Quote
Old 11-03-2011, 05:44 PM   #3
achims
Member
achims began at the beginning.
 
Posts: 24
Karma: 12
Join Date: Oct 2011
Device: Xperia Active, Iconia A500, Galaxy I5500
Thanks, good idea. I've posted it there now.
achims is offline   Reply With Quote
Old 11-04-2011, 08:52 AM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by achims View Post
Thanks, good idea. I've posted it there now.
Thanks. That makes it much easier to find.
Starson17 is offline   Reply With Quote
Old 11-05-2011, 10:42 AM   #5
tobias2
Member
tobias2 began at the beginning.
 
Posts: 18
Karma: 36
Join Date: Feb 2011
Device: Kindle
Bug report

Hi achims,

First of all, thanks for fixing the old Zeit recipe, I will continue to update and change that one elsewhere (the long thread with all the updates: https://www.mobileread.com/forums/showthread.php?t=90005) since I like the changes that "my" filtering introduces. However, I also really like your new script that downloads the other files. It works fine for me for the Magazin, but when I try to download the PDF of the actual paper I only get the MOBI file even though I disabled all but the PDF:

GET_MOBI=False
GET_PDF=True
GET_AUDIO=False
GET_MAGAZIN=False

Any suggestions?

Cheers,

Tobias
tobias2 is offline   Reply With Quote
Old 11-05-2011, 02:15 PM   #6
achims
Member
achims began at the beginning.
 
Posts: 24
Karma: 12
Join Date: Oct 2011
Device: Xperia Active, Iconia A500, Galaxy I5500
Hi Tobias,

glad to hear that you like my new recipe.

Your problem with the pdf download gets me puzzled, though. Especially that it downloads the mobi although GET_MOBI is set to False.
Perhaps it has to do with the import to the db. Have you tried to eliminate the entries of the actual ZEIT edition (with all formats) from the calibre db, and then retry the recipe? If there is already one entry in any format, calibredb add might refuse to add new formats and it won't create a new entry, neither.

Cheers
Achim
achims is offline   Reply With Quote
Old 11-05-2011, 06:01 PM   #7
achims
Member
achims began at the beginning.
 
Posts: 24
Karma: 12
Join Date: Oct 2011
Device: Xperia Active, Iconia A500, Galaxy I5500
New version - includes Zeit Beilage

Hi,

this week's Zeit edition includes an extra giveaway - the Beilage, which only appears every once in while. I have updated the recipe to include the Beilage, if it is offered by the Zeit Premium web page.

I have also tweaked a bit on the metadata of the Magazin (same applies to Beilage). Due to a bug in 'calibredb add', which does not process its title and author options, the recipe uses the file name as a fallback workaround. This unfortunately implies the addition of a random string in the title. I changed this now to be at the end of the title where it should disturb less. Note that if you have your calibre's ebook import option set to
Code:
(?P<title>.+) - (?P<author>[^_]+)
, then the Author tag will be set accordingly.
I also updated the recipe such that when this bug is resolved, the random string issue will be solved without further change in the recipe.

As a response to Tobias, I have changed the 'calibredb add' system call to include the 'duplicates' option. This way, calling the recipe several times (e.g. with different format switches) will create a new book entry for each call. I hope this resolves his problem.

Spoiler:
Code:
import re, zipfile, os
from calibre.ptempfile import PersistentTemporaryDirectory
from calibre.ptempfile import PersistentTemporaryFile
from urlparse import urlparse

GET_MOBI=False
GET_PDF=False
GET_AUDIO=False
GET_MAGAZIN=False
GET_BEILAGE=False

class ZeitPremiumAllFormats(BasicNewsRecipe):
    title          = u'Zeit Premium All Formats'
    description    = u'Lädt alle angebotenen E-Book Formate der aktuellen Woche aus dem Zeit Premium Bereich (kostenpflichtiges Abo): Die Zeit als epub, mobi, pdf und alle Audiofiles als zip. Sie werden in der Calibre Datenbank als ein einziges Buch eingetragen. Das Zeit Magazin und ggfls. die Beilage als pdf als je eigenständiges Buch. Aus technischen Gründen wird ein doppelter Bucheintrag der Zeit erstellt, der ein epub in einer abgewandelten Version erhält. Dieser Eintrag kann gelöscht werden. Alle Formate ausser epub können ein- oder ausgeschaltet werden. Anmerkung: Während der Umstellung auf eine neue Ausgabe (Mittwoch abends) werden nicht alle Formate gleichzeitig erneuert. Im Calibre Eintrag können dann die verschiedenen Formate zu verschiedenen Ausgaben gehören! Sollte schon ein Eintrag der Zeit der aktuellen Woche existieren, wird nicht aktualisiert, also vorher löschen! ___Getestet unter Unix___ - unter anderen Betriebssystemen funktioniert dieses recipe möglicherweise nicht.'
    __author__ = 'Achim Schumacher'
    language = 'de'
    needs_subscription = True
    conversion_options = {
        'no_default_epub_cover' : True,
    }

    #
    # Login process required:
    # Override BasicNewsRecipe.get_browser()
    #
    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        # new login process
        domain = "https://premium.zeit.de"
        response = br.open(domain)
        # Get rid of nested form
        response.set_data(re.sub('<div><form action=.*', '', response.get_data() ))
        br.set_response(response)
        br.select_form(nr=2)
        br.form['name']=self.username
        br.form['pass']=self.password
        br.submit()
        return br


    # Do not fetch news and convert them to E-Books.
    # Instead, download the epub directly from the site.
    # For this, override BasicNewsRecipe.build_index()
    #
    def build_index(self):
        browser = self.get_browser()

        # find the links
        epublink = browser.find_link(text_regex=re.compile('.*Ausgabe als Datei im ePub-Format.*'))
        mobilink = browser.find_link(text_regex=re.compile('.*Ausgabe als Datei im Mobi-Format.*'))
        pdflink = browser.find_link(text_regex=re.compile('.*Download der gesamten Ausgabe als PDF Datei.*'))
        audiolink = browser.find_link(text_regex=re.compile('.*Alle Audios der aktuellen ZEIT.*'))
        edition = (urlparse(pdflink.url)[2]).replace('/system/files/epaper/DZ/pdf/DZ_ePaper_','').replace('.pdf','')
        zm_url = urlparse(pdflink.base_url)[0]+'://'+urlparse(pdflink.base_url)[1]+''+(urlparse(pdflink.url)[2]).replace('DZ/pdf/DZ_ePaper','ZM/pdf/ZM_ePaper')
        bl_url = urlparse(pdflink.base_url)[0]+'://'+urlparse(pdflink.base_url)[1]+''+(urlparse(pdflink.url)[2]).replace('DZ/pdf/DZ_ePaper','BL/pdf/BL_ePaper')
        print "Found epub-link: %s" % epublink.url
        print "Found Mobi-link: %s" % mobilink.url
        print "Found pdf-link: %s" % pdflink.url
        print "Found audio-link: %s" % audiolink.url
        print "Will try ZM-link: %s" % zm_url
        print "Will try BL-link: %s" % bl_url
        print "This edition is: %s" % edition

        # The following part is from a recipe by Starson17
        #
        # It modifies build_index, which is the method that gets the 
        # masthead image and cover, parses the feed for articles, retrieves
        # the articles, removes tags from articles, etc. All of those steps 
        # ultimately produce a local directory structure that looks like an 
        # unzipped EPUB. 
        #
        # This part grabs the link to one EPUB, saves the EPUB locally,
        # extracts it, and passes the result back into the recipe system
        # as though all the other steps had been completed normally.
        #
        # This has to be done, even if one does not want to use this
        # calibre-modified epub. Otherwise, the recipe runs into an error.
        # This is the reason why there shows up a second Die Zeit entry
        # in calibre db.
        self.report_progress(0,_('downloading epub'))
        response = browser.follow_link(epublink)
        # We need two different directories for Die Zeit and Zeit Magazin
        DZdir = PersistentTemporaryDirectory()
        ZMdir = PersistentTemporaryDirectory()
        BLdir = PersistentTemporaryDirectory()
        epub_file = PersistentTemporaryFile(suffix='.epub',dir=DZdir)
        epub_file.write(response.read())
        epub_file.close()
        zfile = zipfile.ZipFile(epub_file.name, 'r')
        self.report_progress(0.1,_('extracting epub'))
        zfile.extractall(self.output_dir)
        epub_file.close()
        index = os.path.join(self.output_dir, 'content.opf')
        self.report_progress(0.2,_('epub downloaded and extracted'))

        #
        # Now, download the remaining files
        #
        print "output_dir is: %s" % self.output_dir
        print "DZdir is: %s" % DZdir
        print "ZMdir is: %s" % ZMdir
        print "BLdir is: %s" % BLdir

        if (GET_MOBI):
           self.report_progress(0.3,_('downloading mobi'))
           mobi_file = PersistentTemporaryFile(suffix='.mobi',dir=DZdir)
           browser.back()
           response = browser.follow_link(mobilink)
           mobi_file.write(response.read())
           mobi_file.close()

        if (GET_PDF):
           self.report_progress(0.4,_('downloading pdf'))
           pdf_file = PersistentTemporaryFile(suffix='.pdf',dir=DZdir)
           browser.back()
           response = browser.follow_link(pdflink)
           pdf_file.write(response.read())
           pdf_file.close()

        if (GET_AUDIO):
           self.report_progress(0.5,_('downloading audio'))
           audio_file = PersistentTemporaryFile(suffix='.mp3.zip',dir=DZdir)
           browser.back()
           response = browser.follow_link(audiolink)
           audio_file.write(response.read())
           audio_file.close()

        # Get all Die Zeit formats into Calibre's database
        self.report_progress(0.6,_('Adding Die Zeit to Calibre db'))
        cmd = "calibredb add -d -1 " + DZdir
        os.system(cmd)

        # Zeit Magazin has to be handled differently.
        # First, it has to be downloaded into it's own directory, since it
        # is a different book as Die Zeit.
        # Second, we know its url rather than its link.
        # Third, there is no Metadata present, so we need to give it
        # a proper name so that calibre will set Author and Title at import.
        # Unfortunately, the present solution includes a random part in the
        # name which after db import has to be manually resolved by the user.
        if (GET_MAGAZIN):
           self.report_progress(0.7,_('downloading ZM'))
           author = "Zeitverlag Gerd Bucerius GmbH und Co. KG"
           title="Zeit Magazin "+edition
           ZM_file = PersistentTemporaryFile(prefix='Zeit_Magazin_'+edition+'___',suffix='_-_'+author+'.pdf',dir=ZMdir)
           try:
              response = browser.open(zm_url)
              ZM_file.write(response.read())
              ZM_file.close()
              # Get Zeit Magazin into Calibre's database
              self.report_progress(0.8,_('Adding Zeit Magazin to Calibre db'))
              cmd = "calibredb add -a \""+author+"\" -t \""+title+"\" " + ZMdir
              print cmd
              os.system(cmd)
           except:
              self.report_progress(0.8,_('No Zeit Magazin found...'))

        # Zeit Beilage is technically the same as Zeit Magazin, but it is
        # not included in every edition. So, the use of try: is 
        # obligatory here.
        if (GET_BEILAGE):
           self.report_progress(0.9,_('downloading BL'))
           author = "Zeitverlag Gerd Bucerius GmbH und Co. KG"
           title="Zeit Beilage "+edition
           BL_file = PersistentTemporaryFile(prefix='Zeit_Beilage_'+edition+'___',suffix='_-_'+author+'.pdf',dir=BLdir)
           try:
              response = browser.open(bl_url)
              BL_file.write(response.read())
              BL_file.close()
              # Get Zeit Beilage into Calibre's database
              self.report_progress(0.9,_('Adding Zeit Beilage to Calibre db'))
              cmd = "calibredb add -a \""+author+"\" -t \""+title+"\" " + BLdir
              print cmd
              os.system(cmd)
           except:
              self.report_progress(0.9,_('No Zeit Beilage found...'))

        return index


Have fun
Achim
achims is offline   Reply With Quote
Old 11-08-2011, 09:56 AM   #8
achims
Member
achims began at the beginning.
 
Posts: 24
Karma: 12
Join Date: Oct 2011
Device: Xperia Active, Iconia A500, Galaxy I5500
New version - set Metadata to your liking, no system calls

Hi all,

I have an updated version of the ZEIT recipe. These are the changes:

- No more system calls to 'calibredb add'. Instead, (modified) internal calibre functions are used. These mods were needed for the next point:

- Set metadata to your likings. You can set authors and tags. Zeit Magazin and Beilage now have correct title and author.

Have fun
Achim

Spoiler:
Code:
import sys, re, zipfile, os
from calibre.ptempfile import PersistentTemporaryDirectory
from calibre.ptempfile import PersistentTemporaryFile
from urlparse import urlparse
from calibre.ebooks.metadata import MetaInformation, string_to_authors
from calibre.library.cli import do_add_empty, send_message, write_dirtied, do_add
from calibre.utils.config import prefs
from calibre.library.database2 import LibraryDatabase2



GET_MOBI=False
GET_PDF=False
GET_AUDIO=False
GET_MAGAZIN=True
GET_BEILAGE=True
authors = 'Zeitverlag Gerd Bucerius GmbH und Co. KG'
tags = ['Die Zeit']
languages = ['de']

class ZeitPremiumAllFormats(BasicNewsRecipe):
    title          = u'Zeit Premium All Formats'
    description    = u'Lädt alle angebotenen E-Book Formate der aktuellen Woche aus dem Zeit Premium Bereich (kostenpflichtiges Abo): Die Zeit als epub, mobi, pdf und alle Audiofiles als zip. Sie werden in der Calibre Datenbank als ein einziges Buch eingetragen. Das Zeit Magazin und ggfls. die Beilage als pdf als je eigenständiges Buch. Aus technischen Gründen wird ein doppelter Bucheintrag der Zeit erstellt, der ein epub in einer abgewandelten Version erhält. Dieser Eintrag kann gelöscht werden. Alle Formate ausser epub können ein- oder ausgeschaltet werden. Anmerkung: Während der Umstellung auf eine neue Ausgabe (Mittwoch abends) werden nicht alle Formate gleichzeitig erneuert. Im Calibre Eintrag können dann die verschiedenen Formate zu verschiedenen Ausgaben gehören! Bei mehrfachem Aufruf werden Duplikate der Bucheinträge erstellt.'
    __author__ = 'Achim Schumacher'
    language = 'de'
    needs_subscription = True
    conversion_options = {
        'no_default_epub_cover' : True,
    }

    #
    # Login process required:
    # Override BasicNewsRecipe.get_browser()
    #
    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        # new login process
        domain = "https://premium.zeit.de"
        response = br.open(domain)
        # Get rid of nested form
        response.set_data(re.sub('<div><form action=.*', '', response.get_data() ))
        br.set_response(response)
        br.select_form(nr=2)
        br.form['name']=self.username
        br.form['pass']=self.password
        br.submit()
        return br


    # Copies only those parts of the new metadata to the old metadata
    # which have actual data.
    def copy_metadata(self, new, old):
        mi = old
        if new.title:
            mi.title = new.title
        if new.authors:
            mi.authors = new.authors
        if new.isbn:
            mi.isbn = new.isbn
        if new.tags:
            mi.tags = new.tags
        if new.languages:
            mi.languages = new.languages
        return mi

    # Override calibre.library.import_book_directory
    # because it does not offer options to set metadata.
    # This version adds a new option mi: 
    # for all mi-fields which have data, the data is copied to ebook's metadata
    def import_book_directory(self, db, dirpath, mi2, callback=None):
        from calibre.ebooks.metadata.meta import metadata_from_formats
        dirpath = os.path.abspath(dirpath)
        formats = db.find_books_in_directory(dirpath, True)
        formats = list(formats)[0]
        if not formats:
            return
        mi = metadata_from_formats(formats)
        mi = self.copy_metadata(mi2, mi)
        if mi.title is None:
            return
        if db.has_book(mi):
            return [(mi, formats)]
        db.import_book(mi, formats)
        if callable(callback):
            callback(mi.title)


    # Override calibre.library.do_add,
    # because it does not offer options to set metadata.
    # This version adds a new option mi: 
    # for all mi-fields which have data, the data is copied to ebook's metadata
    # In this version: recurse=False, one_book_per_directory=True
    def do_add(self, db, paths, mi2, add_duplicates):
        from calibre.ebooks.metadata.meta import get_metadata
        orig = sys.stdout
        #sys.stdout = NULL
        try:
            files, dirs = [], []
            for path in paths:
                path = os.path.abspath(path)
                if os.path.isdir(path):
                    dirs.append(path)
                else:
                    if os.path.exists(path):
                        files.append(path)
                    else:
                        print path, 'not found'

            formats, metadata = [], []
            for book in files:
                format = os.path.splitext(book)[1]
                format = format[1:] if format else None
                if not format:
                    continue
                stream = open(book, 'rb')
                mi = get_metadata(stream, stream_type=format, use_libprs_metadata=True)
                if not mi.title:
                    mi.title = os.path.splitext(os.path.basename(book))[0]
                if not mi.authors:
                    mi.authors = [_('Unknown')]
                mi = self.copy_metadata(mi2, mi)
                formats.append(format)
                metadata.append(mi)

            file_duplicates = []
            if files:
                file_duplicates = db.add_books(files, formats, metadata,
                                               add_duplicates=add_duplicates)
                if file_duplicates:
                    file_duplicates = file_duplicates[0]
    

            dir_dups = []
            for dir in dirs:
#                if recurse:
#                    dir_dups.extend(db.recursive_import(dir, single_book_per_directory=one_book_per_directory))
#                else:
                    func = self.import_book_directory(db, dir, mi2)# if one_book_per_directory else db.import_book_directory_multiple
                    dups = func
                    if not dups:
                        dups = []
                    dir_dups.extend(dups)

            sys.stdout = sys.__stdout__

            if add_duplicates:
                for mi, formats in dir_dups:
                    mi = self.copy_metadata(mi2, mi)
                    db.import_book(mi, formats)
            else:
                if dir_dups or file_duplicates:
                    print >>sys.stderr, _('The following books were not added as '
                                          'they already exist in the database '
                                          '(see --duplicates option):')
                for mi, formats in dir_dups:
                    title = mi.title
                    if isinstance(title, unicode):
                        title = title.encode(preferred_encoding)
                    print >>sys.stderr, '\t', title + ':'
                    for path in formats:
                        print >>sys.stderr, '\t\t ', path
                if file_duplicates:
                    for path, mi in zip(file_duplicates[0], file_duplicates[2]):
                        title = mi.title
                        if isinstance(title, unicode):
                            title = title.encode(preferred_encoding)
                        print >>sys.stderr, '\t', title+':'
                        print >>sys.stderr, '\t\t ', path

            write_dirtied(db)
            send_message()
        finally:
            sys.stdout = orig



    # Do not fetch news and convert them to E-Books.
    # Instead, download the epub directly from the site.
    # For this, override BasicNewsRecipe.build_index()
    #
    def build_index(self):
        browser = self.get_browser()
        # Get the path to the db
        dbpath = prefs['library_path']
        # Get access to the database
        dbpath = os.path.abspath(dbpath)
        db = LibraryDatabase2(dbpath)


        # find the links
        epublink = browser.find_link(text_regex=re.compile('.*Ausgabe als Datei im ePub-Format.*'))
        mobilink = browser.find_link(text_regex=re.compile('.*Ausgabe als Datei im Mobi-Format.*'))
        pdflink = browser.find_link(text_regex=re.compile('.*Download der gesamten Ausgabe als PDF Datei.*'))
        audiolink = browser.find_link(text_regex=re.compile('.*Alle Audios der aktuellen ZEIT.*'))
        #edition = (urlparse(pdflink.url)[2]).replace('/system/files/epaper/DZ/pdf/DZ_ePaper_','').replace('.pdf','')
        edition_ = re.split('_', (urlparse(pdflink.url)[2]).replace('/system/files/epaper/DZ/pdf/DZ_ePaper_','').replace('.pdf','') )
        edition = '20' + edition_[1] + ' - ' + edition_[0]
        zm_url = urlparse(pdflink.base_url)[0]+'://'+urlparse(pdflink.base_url)[1]+''+(urlparse(pdflink.url)[2]).replace('DZ/pdf/DZ_ePaper','ZM/pdf/ZM_ePaper')
        bl_url = urlparse(pdflink.base_url)[0]+'://'+urlparse(pdflink.base_url)[1]+''+(urlparse(pdflink.url)[2]).replace('DZ/pdf/DZ_ePaper','BL/pdf/BL_ePaper')
        print "Found epub-link: %s" % epublink.url
        print "Found Mobi-link: %s" % mobilink.url
        print "Found pdf-link: %s" % pdflink.url
        print "Found audio-link: %s" % audiolink.url
        print "Will try ZM-link: %s" % zm_url
        print "Will try BL-link: %s" % bl_url
        print "This edition is: %s" % edition

        # The following part is from a recipe by Starson17
        #
        # It modifies build_index, which is the method that gets the 
        # masthead image and cover, parses the feed for articles, retrieves
        # the articles, removes tags from articles, etc. All of those steps 
        # ultimately produce a local directory structure that looks like an 
        # unzipped EPUB. 
        #
        # This part grabs the link to one EPUB, saves the EPUB locally,
        # extracts it, and passes the result back into the recipe system
        # as though all the other steps had been completed normally.
        #
        # This has to be done, even if one does not want to use this
        # calibre-modified epub. Otherwise, the recipe runs into an error.
        # This is the reason why there shows up a second Die Zeit entry
        # in calibre db.
        self.report_progress(0,_('downloading epub'))
        response = browser.follow_link(epublink)
        # We need two different directories for Die Zeit and Zeit Magazin
        DZdir = PersistentTemporaryDirectory(prefix='DZ_')
        ZMdir = PersistentTemporaryDirectory(prefix='ZM_')
        BLdir = PersistentTemporaryDirectory(prefix='BL_')
        epub_file = PersistentTemporaryFile(suffix='.epub',dir=DZdir)
        epub_file.write(response.read())
        epub_file.close()
        zfile = zipfile.ZipFile(epub_file.name, 'r')
        self.report_progress(0.1,_('extracting epub'))
        zfile.extractall(self.output_dir)
        epub_file.close()
        index = os.path.join(self.output_dir, 'content.opf')
        self.report_progress(0.2,_('epub downloaded and extracted'))

        #
        # Now, download the remaining files
        #
        print "output_dir is: %s" % self.output_dir
        print "DZdir is: %s" % DZdir
        print "ZMdir is: %s" % ZMdir
        print "BLdir is: %s" % BLdir

        if (GET_MOBI):
           self.report_progress(0.3,_('downloading mobi'))
           mobi_file = PersistentTemporaryFile(suffix='.mobi',dir=DZdir)
           browser.back()
           response = browser.follow_link(mobilink)
           mobi_file.write(response.read())
           mobi_file.close()

        if (GET_PDF):
           self.report_progress(0.4,_('downloading pdf'))
           pdf_file = PersistentTemporaryFile(suffix='.pdf',dir=DZdir)
           browser.back()
           response = browser.follow_link(pdflink)
           pdf_file.write(response.read())
           pdf_file.close()

        if (GET_AUDIO):
           self.report_progress(0.5,_('downloading audio'))
           audio_file = PersistentTemporaryFile(suffix='.mp3.zip',dir=DZdir)
           browser.back()
           response = browser.follow_link(audiolink)
           audio_file.write(response.read())
           audio_file.close()

        # Get all Die Zeit formats into Calibre's database
        self.report_progress(0.6,_('Adding Die Zeit to Calibre db'))
        mi = MetaInformation(None)
        title="Die ZEIT "+edition
        mi.title = title
        mi.authors = string_to_authors(authors)
        mi.tags = tags
        mi.languages = languages
        self.do_add(db, [DZdir], mi, True)
        

        # Zeit Magazin has to be handled differently.
        # First, it has to be downloaded into it's own directory, since it
        # is a different book as Die Zeit.
        # Second, we know its url rather than its link.
        # Third, there is no Metadata present in the file itself.
        if (GET_MAGAZIN):
           self.report_progress(0.7,_('downloading ZM'))
           title="ZEIT Magazin "+edition
           ZM_file = PersistentTemporaryFile(suffix='.pdf',dir=ZMdir)
           try:
              response = browser.open(zm_url)
              ZM_file.write(response.read())
              ZM_file.close()
              # Get Zeit Magazin into Calibre's database
              self.report_progress(0.8,_('Adding Zeit Magazin to Calibre db'))
              mi.title = title
              self.do_add(db, [ZMdir], mi, True)

           except:
              self.report_progress(0.8,_('No Zeit Magazin found...'))

        # Zeit Beilage is technically the same as Zeit Magazin, but it is
        # not included in every edition. So, the use of try: is 
        # obligatory here.
        if (GET_BEILAGE):
           self.report_progress(0.9,_('downloading BL'))
           title="ZEIT Beilage "+edition
           BL_file = PersistentTemporaryFile(suffix='.pdf',dir=BLdir)
           try:
              response = browser.open(bl_url)
              BL_file.write(response.read())
              BL_file.close()
              # Get Zeit Beilage into Calibre's database
              self.report_progress(0.9,_('Adding Zeit Beilage to Calibre db'))
              mi.title = title
              self.do_add(db, [BLdir], mi, True)
           except:
              self.report_progress(0.9,_('No Zeit Beilage found...'))

        return index
achims is offline   Reply With Quote
Old 11-19-2011, 06:01 AM   #9
tobias2
Member
tobias2 began at the beginning.
 
Posts: 18
Karma: 36
Join Date: Feb 2011
Device: Kindle
Hi achims,

I just got around trying your new version, and I still have the effect that despite epub being disabled (False) that the epub is downloaded (and consequently, even a mobi version is created based on the epub). The PDF is now also downloaded, maybe the last time this not happening may indeed be due to me having a copy of the same edition as mobi or epub in Calibre before.

Cheers,

Tobias
tobias2 is offline   Reply With Quote
Old 11-19-2011, 07:52 AM   #10
achims
Member
achims began at the beginning.
 
Posts: 24
Karma: 12
Join Date: Oct 2011
Device: Xperia Active, Iconia A500, Galaxy I5500
Hi Tobias,

there is no option to disable epub in the recipe.

Downloading the epub is an integral part of the recipe, since the epub is used to "cheat" the recipe mechanism. This mechanism takes an unzipped epub and then makes all conversions the user may have specified -- in your case it may create even a mobi file. There is no way to tell calibre not to build an entry with this epub (and mobi) file.

The recipe does additionally download the pdf version of the Zeit, the Magazin and the Beilage, and create new database entries for the three logical books. This is the part where one can enable/disable formats. I did not include an option to not include the epub, since it is downloaded anyway, but I could indeed include this option.

If all is enabled, the recipe, when called from within the calibre GUI, will therefore create 4 new book entries. The Zeit will have two entries due to the above cheating thing - you will have to delete the doubled entry by hand.

When called from the commandline, it will create 3 new book entries and create a new epub file. This epub file is the one resulting from cheating the recipe mechanism.
Code:
ebook-convert Zeit\ Premium\ All\ Formats.recipe Zeit.epub --password=PWD --username=USER
I hope this clarifies a bit

Cheers
Achim
achims is offline   Reply With Quote
Old 12-04-2011, 09:38 AM   #11
achims
Member
achims began at the beginning.
 
Posts: 24
Karma: 12
Join Date: Oct 2011
Device: Xperia Active, Iconia A500, Galaxy I5500
Version update

Hi all,

due to a change of the ZEIT's web layout the recipe had to be adjusted.
This is the new version:
Spoiler:
Code:
import sys, re, zipfile, os
from calibre.ptempfile import PersistentTemporaryDirectory
from calibre.ptempfile import PersistentTemporaryFile
from urlparse import urlparse
from calibre.ebooks.metadata import MetaInformation, string_to_authors
from calibre.library.cli import do_add_empty, send_message, write_dirtied, do_add
from calibre.utils.config import prefs
from calibre.library.database2 import LibraryDatabase2



GET_MOBI=False
GET_PDF=True
GET_AUDIO=True
GET_MAGAZIN=True
GET_BEILAGE=True
authors = 'Zeitverlag Gerd Bucerius GmbH und Co. KG'
tags = ['Die Zeit']
languages = ['de']

class ZeitPremiumAllFormats(BasicNewsRecipe):
    title          = u'Zeit Premium All Formats'
    description    = u'Lädt alle angebotenen E-Book Formate der aktuellen Woche aus dem Zeit Premium Bereich (kostenpflichtiges Abo): Die Zeit als epub, mobi, pdf und alle Audiofiles als zip. Sie werden in der Calibre Datenbank als ein einziges Buch eingetragen. Das Zeit Magazin und ggfls. die Beilage als pdf als je eigenständiges Buch. Aus technischen Gründen wird ein doppelter Bucheintrag der Zeit erstellt, der ein epub in einer abgewandelten Version erhält. Dieser Eintrag kann gelöscht werden. Alle Formate ausser epub können ein- oder ausgeschaltet werden. Anmerkung: Während der Umstellung auf eine neue Ausgabe (Mittwoch abends) werden nicht alle Formate gleichzeitig erneuert. Im Calibre Eintrag können dann die verschiedenen Formate zu verschiedenen Ausgaben gehören! Bei mehrfachem Aufruf werden Duplikate der Bucheinträge erstellt.'
    __author__ = 'Achim Schumacher'
    language = 'de'
    needs_subscription = True
    conversion_options = {
        'no_default_epub_cover' : True,
    }

    #
    # Login process required:
    # Override BasicNewsRecipe.get_browser()
    #
    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        # new login process
        domain = "https://premium.zeit.de"
        response = br.open(domain)
        # Get rid of nested form
        response.set_data(re.sub('<div><form action=.*', '', response.get_data() ))
        br.set_response(response)
        br.select_form(nr=2)
        br.form['name']=self.username
        br.form['pass']=self.password
        br.submit()
        return br


    # Copies only those parts of the new metadata to the old metadata
    # which have actual data.
    def copy_metadata(self, new, old):
        mi = old
        if new.title:
            mi.title = new.title
        if new.authors:
            mi.authors = new.authors
        if new.isbn:
            mi.isbn = new.isbn
        if new.tags:
            mi.tags = new.tags
        if new.languages:
            mi.languages = new.languages
        return mi

    # Override calibre.library.import_book_directory
    # because it does not offer options to set metadata.
    # This version adds a new option mi: 
    # for all mi-fields which have data, the data is copied to ebook's metadata
    def import_book_directory(self, db, dirpath, mi2, callback=None):
        from calibre.ebooks.metadata.meta import metadata_from_formats
        dirpath = os.path.abspath(dirpath)
        formats = db.find_books_in_directory(dirpath, True)
        formats = list(formats)[0]
        if not formats:
            return
        mi = metadata_from_formats(formats)
        mi = self.copy_metadata(mi2, mi)
        if mi.title is None:
            return
        if db.has_book(mi):
            return [(mi, formats)]
        db.import_book(mi, formats)
        if callable(callback):
            callback(mi.title)


    # Override calibre.library.do_add,
    # because it does not offer options to set metadata.
    # This version adds a new option mi: 
    # for all mi-fields which have data, the data is copied to ebook's metadata
    # In this version: recurse=False, one_book_per_directory=True
    def do_add(self, db, paths, mi2, add_duplicates):
        from calibre.ebooks.metadata.meta import get_metadata
        orig = sys.stdout
        #sys.stdout = NULL
        try:
            files, dirs = [], []
            for path in paths:
                path = os.path.abspath(path)
                if os.path.isdir(path):
                    dirs.append(path)
                else:
                    if os.path.exists(path):
                        files.append(path)
                    else:
                        print path, 'not found'

            formats, metadata = [], []
            for book in files:
                format = os.path.splitext(book)[1]
                format = format[1:] if format else None
                if not format:
                    continue
                stream = open(book, 'rb')
                mi = get_metadata(stream, stream_type=format, use_libprs_metadata=True)
                if not mi.title:
                    mi.title = os.path.splitext(os.path.basename(book))[0]
                if not mi.authors:
                    mi.authors = [_('Unknown')]
                mi = self.copy_metadata(mi2, mi)
                formats.append(format)
                metadata.append(mi)

            file_duplicates = []
            if files:
                file_duplicates = db.add_books(files, formats, metadata,
                                               add_duplicates=add_duplicates)
                if file_duplicates:
                    file_duplicates = file_duplicates[0]
    

            dir_dups = []
            for dir in dirs:
#                if recurse:
#                    dir_dups.extend(db.recursive_import(dir, single_book_per_directory=one_book_per_directory))
#                else:
                    func = self.import_book_directory(db, dir, mi2)# if one_book_per_directory else db.import_book_directory_multiple
                    dups = func
                    if not dups:
                        dups = []
                    dir_dups.extend(dups)

            sys.stdout = sys.__stdout__

            if add_duplicates:
                for mi, formats in dir_dups:
                    mi = self.copy_metadata(mi2, mi)
                    db.import_book(mi, formats)
            else:
                if dir_dups or file_duplicates:
                    print >>sys.stderr, _('The following books were not added as '
                                          'they already exist in the database '
                                          '(see --duplicates option):')
                for mi, formats in dir_dups:
                    title = mi.title
                    if isinstance(title, unicode):
                        title = title.encode(preferred_encoding)
                    print >>sys.stderr, '\t', title + ':'
                    for path in formats:
                        print >>sys.stderr, '\t\t ', path
                if file_duplicates:
                    for path, mi in zip(file_duplicates[0], file_duplicates[2]):
                        title = mi.title
                        if isinstance(title, unicode):
                            title = title.encode(preferred_encoding)
                        print >>sys.stderr, '\t', title+':'
                        print >>sys.stderr, '\t\t ', path

            write_dirtied(db)
            send_message()
        finally:
            sys.stdout = orig



    # Do not fetch news and convert them to E-Books.
    # Instead, download the epub directly from the site.
    # For this, override BasicNewsRecipe.build_index()
    #
    def build_index(self):
        browser = self.get_browser()
        # Get the path to the db
        dbpath = prefs['library_path']
        # Get access to the database
        dbpath = os.path.abspath(dbpath)
        db = LibraryDatabase2(dbpath)


        # find the links
        epublink = browser.find_link(text_regex=re.compile('.*als Datei im ePub-Format.*'))
        mobilink = browser.find_link(text_regex=re.compile('.*im Mobi-Format.*'))
        pdflink = browser.find_link(text_regex=re.compile('.*Download der gesamten Ausgabe als PDF Datei.*'))
        browser.open("https://premium.zeit.de/abo/zeit-audio")
        audiolink = browser.find_link(text_regex=re.compile('.*Alle Audios der aktuellen ZEIT.*'))
        browser.back()
        #edition = (urlparse(pdflink.url)[2]).replace('/system/files/epaper/DZ/pdf/DZ_ePaper_','').replace('.pdf','')
        edition_ = re.split('_', (urlparse(pdflink.url)[2]).replace('/system/files/epaper/DZ/pdf/DZ_ePaper_','').replace('.pdf','') )
        edition = '20' + edition_[1] + ' - ' + edition_[0]
        au_url = urlparse(pdflink.base_url)[0]+'://'+urlparse(pdflink.base_url)[1]+''+(urlparse(pdflink.url)[2]).replace('epaper/DZ/pdf/DZ_ePaper','ZM/pdf/ZM_ePaper')
        zm_url = urlparse(pdflink.base_url)[0]+'://'+urlparse(pdflink.base_url)[1]+''+(urlparse(pdflink.url)[2]).replace('DZ/pdf/DZ_ePaper','ZM/pdf/ZM_ePaper')
        bl_url = urlparse(pdflink.base_url)[0]+'://'+urlparse(pdflink.base_url)[1]+''+(urlparse(pdflink.url)[2]).replace('DZ/pdf/DZ_ePaper','BL/pdf/BL_ePaper')
        print "Found epub-link: %s" % epublink.url
        print "Found Mobi-link: %s" % mobilink.url
        print "Found pdf-link: %s" % pdflink.url
        print "Found audio-link: %s" % audiolink.url
        print "Will try ZM-link: %s" % zm_url
        print "Will try BL-link: %s" % bl_url
        print "This edition is: %s" % edition

        # The following part is from a recipe by Starson17
        #
        # It modifies build_index, which is the method that gets the 
        # masthead image and cover, parses the feed for articles, retrieves
        # the articles, removes tags from articles, etc. All of those steps 
        # ultimately produce a local directory structure that looks like an 
        # unzipped EPUB. 
        #
        # This part grabs the link to one EPUB, saves the EPUB locally,
        # extracts it, and passes the result back into the recipe system
        # as though all the other steps had been completed normally.
        #
        # This has to be done, even if one does not want to use this
        # calibre-modified epub. Otherwise, the recipe runs into an error.
        # This is the reason why there shows up a second Die Zeit entry
        # in calibre db.
        self.report_progress(0,_('downloading epub'))
        response = browser.follow_link(epublink)
        # We need two different directories for Die Zeit and Zeit Magazin
        DZdir = PersistentTemporaryDirectory(prefix='DZ_')
        ZMdir = PersistentTemporaryDirectory(prefix='ZM_')
        BLdir = PersistentTemporaryDirectory(prefix='BL_')
        epub_file = PersistentTemporaryFile(suffix='.epub',dir=DZdir)
        epub_file.write(response.read())
        epub_file.close()
        zfile = zipfile.ZipFile(epub_file.name, 'r')
        self.report_progress(0.1,_('extracting epub'))
        zfile.extractall(self.output_dir)
        epub_file.close()
        index = os.path.join(self.output_dir, 'content.opf')
        self.report_progress(0.2,_('epub downloaded and extracted'))

        #
        # Now, download the remaining files
        #
        print "output_dir is: %s" % self.output_dir
        print "DZdir is: %s" % DZdir
        print "ZMdir is: %s" % ZMdir
        print "BLdir is: %s" % BLdir

        if (GET_MOBI):
           self.report_progress(0.3,_('downloading mobi'))
           mobi_file = PersistentTemporaryFile(suffix='.mobi',dir=DZdir)
           browser.back()
           response = browser.follow_link(mobilink)
           mobi_file.write(response.read())
           mobi_file.close()

        if (GET_PDF):
           self.report_progress(0.4,_('downloading pdf'))
           pdf_file = PersistentTemporaryFile(suffix='.pdf',dir=DZdir)
           browser.back()
           response = browser.follow_link(pdflink)
           pdf_file.write(response.read())
           pdf_file.close()

        if (GET_AUDIO):
           self.report_progress(0.5,_('downloading audio'))
           audio_file = PersistentTemporaryFile(suffix='.mp3.zip',dir=DZdir)
           browser.back()
           response = browser.follow_link(audiolink)
           audio_file.write(response.read())
           audio_file.close()

        # Get all Die Zeit formats into Calibre's database
        self.report_progress(0.6,_('Adding Die Zeit to Calibre db'))
        mi = MetaInformation(None)
        title="Die ZEIT "+edition
        mi.title = title
        mi.authors = string_to_authors(authors)
        mi.tags = tags
        mi.languages = languages
        self.do_add(db, [DZdir], mi, True)
        

        # Zeit Magazin has to be handled differently.
        # First, it has to be downloaded into it's own directory, since it
        # is a different book as Die Zeit.
        # Second, we know its url rather than its link.
        # Third, there is no Metadata present in the file itself.
        if (GET_MAGAZIN):
           self.report_progress(0.7,_('downloading ZM'))
           title="ZEIT Magazin "+edition
           ZM_file = PersistentTemporaryFile(suffix='.pdf',dir=ZMdir)
           try:
              response = browser.open(zm_url)
              ZM_file.write(response.read())
              ZM_file.close()
              # Get Zeit Magazin into Calibre's database
              self.report_progress(0.8,_('Adding Zeit Magazin to Calibre db'))
              mi.title = title
              self.do_add(db, [ZMdir], mi, True)
           except:
              self.report_progress(0.8,_('No Zeit Magazin found...'))

        # Zeit Beilage is technically the same as Zeit Magazin, but it is
        # not included in every edition. So, the use of try: is 
        # obligatory here.
        if (GET_BEILAGE):
           self.report_progress(0.9,_('downloading BL'))
           title="ZEIT Beilage "+edition
           BL_file = PersistentTemporaryFile(suffix='.pdf',dir=BLdir)
           try:
              response = browser.open(bl_url)
              BL_file.write(response.read())
              BL_file.close()
              # Get Zeit Beilage into Calibre's database
              self.report_progress(0.9,_('Adding Zeit Beilage to Calibre db'))
              mi.title = title
              self.do_add(db, [BLdir], mi, True)
           except:
              self.report_progress(0.9,_('No Zeit Beilage found...'))

        return index


Have fun
Achim
achims is offline   Reply With Quote
Old 12-13-2011, 08:24 PM   #12
rogerben
Junior Member
rogerben began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Dec 2011
Device: Kindle
thanks for the exhaustive recipe! but - is there a way to have calibre send the original mobi to the kindle instead of the converted epub? I find the TOC of the converted file hideous - you need to press lots of buttons to access it, and it is not nicely split into two columns as news should be...BTW, this seems to be a general issue when converting any TOC, or is it just my configuration that's wrong?
rogerben is offline   Reply With Quote
Old 12-14-2011, 02:14 AM   #13
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Had you set in the recipe "GET_MOBI=False" to "GET_MOBI=True"?
Divingduck is offline   Reply With Quote
Old 12-15-2011, 07:34 AM   #14
achims
Member
achims began at the beginning.
 
Posts: 24
Karma: 12
Join Date: Oct 2011
Device: Xperia Active, Iconia A500, Galaxy I5500
Hi rogerben,

apart from setting GET_MOBI=True in my recipe, as hinted by Divinduck, you might need to change the general calibre preferences. Goto Preferences --> Behavior --> Preferred output format (if you run calibre in german it will be Einstellungen --> Verhalten --> Bevorzugtes Ausgabeformat) and set it to mobi.

Also, it might be confusing that my recipe produces two new book entries for the Zeit. One contains all the formats as specified in the recipe's header, in the original version as published by the editor, and in your case it should contain a mobi file. The other book entry contains only an altered epub version -- you should simply delete this book entry.
achims is offline   Reply With Quote
Old 12-16-2011, 04:15 AM   #15
rogerben
Junior Member
rogerben began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Dec 2011
Device: Kindle
Hi,

thanks for your support!

I configured as follows:
GET_MOBI=True
GET_PDF=False
GET_AUDIO=False
GET_MAGAZIN=True
GET_BEILAGE=True

And I have set calibre to prefer MOBI output - nevertheless, it seems to send the modified epub.

Kind regards,
Ben
rogerben is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Recipe for Zeit Abo EPUB download siebert Recipes 24 03-17-2013 09:29 AM
DIE ZEIT Premium recipe doesn't work anymore Moik Recipes 1 07-16-2011 01:46 PM
Zeit-Online recipe does not work with Sony reader lesbett Recipes 5 07-13-2011 11:47 AM
902 Freeze von Zeit zu Zeit knorst PocketBook 7 03-21-2011 05:16 PM
PB360 display ist von zeit zu zeit streifig klaetsch PocketBook 2 01-10-2011 05:24 AM


All times are GMT -4. The time now is 11:19 PM.


MobileRead.com is a privately owned, operated and funded community.