![]() |
#1 |
Member
![]() Posts: 24
Karma: 12
Join Date: Oct 2011
Device: Xperia Active, Iconia A500, Galaxy I5500
|
New Recipe for ZEIT Premium download ALL
I have created a new recipe for ZEIT Premium (subscription only).
It downloads all E-Books the page has to offer: - The main newspaper Die Zeit in all offered formats (epub, mobi, pdf, and a zip with all audiobooks of the newspaper). All formats are imported into calibre db as one logical book entry. - Zeit Magazin (pdf) imported in its own new book entry. The user can easily switch on/off the different formats to download at the recipe's header. I think this is the first recipe to download pdf's etc, so it might be interesting for other recipe developers, too. Code:
import re, zipfile, os from calibre.ptempfile import PersistentTemporaryDirectory from calibre.ptempfile import PersistentTemporaryFile from urlparse import urlparse GET_MOBI=False GET_PDF=True GET_AUDIO=True GET_MAGAZIN=True class ZeitPremiumAllFormats(BasicNewsRecipe): title = u'Zeit Premium All Formats' description = u'Lädt alle angebotenen E-Book Formate der aktuellen Woche aus dem Zeit Premium Bereich (kostenpflichtiges Abo). Dies beinhaltet für Die Zeit die Formate epub, mobi, pdf und alle Audiofiles als zip. Sie werden in der Calibre Datenbank als ein einziges Buch eingetragen. Des weiteren das Zeit Magazin als pdf als eigenständiges Buch. Aus technischen Gründen wird ein dritter Bucheintrag erstellt, der Die Zeit in einer abgewandelten epub Version erhält. Dieser Eintrag kann getrost gelöscht werden. Alle Formate ausser epub können ein- oder ausgeschaltet werden. Anmerkung: Während der Umstellung auf eine neue Ausgabe (Mittwoch abends) werden nicht alle Formate gleichzeitig erneuert. Im Calibre Eintrag können dann die verschiedenen Formate zu verschiedenen Ausgaben gehören! ___Getestet unter Unix___ - unter anderen Betriebssystemen funktioniert dieses recipe möglicherweise nicht.' __author__ = 'Achim Schumacher' language = 'de' needs_subscription = True conversion_options = { 'no_default_epub_cover' : True, } # # Login process required: # Override BasicNewsRecipe.get_browser() # def get_browser(self): br = BasicNewsRecipe.get_browser() # new login process domain = "https://premium.zeit.de" response = br.open(domain) # Get rid of nested form response.set_data(re.sub('<div><form action=.*', '', response.get_data() )) br.set_response(response) br.select_form(nr=2) br.form['name']=self.username br.form['pass']=self.password br.submit() return br # Do not fetch news and convert them to E-Books. # Instead, download the epub directly from the site. # For this, override BasicNewsRecipe.build_index() # def build_index(self): browser = self.get_browser() # find the links epublink = browser.find_link(text_regex=re.compile('.*Ausgabe als Datei im ePub-Format.*')) mobilink = browser.find_link(text_regex=re.compile('.*Ausgabe als Datei im Mobi-Format.*')) pdflink = browser.find_link(text_regex=re.compile('.*Download der gesamten Ausgabe als PDF Datei.*')) audiolink = browser.find_link(text_regex=re.compile('.*Alle Audios der aktuellen ZEIT.*')) edition = (urlparse(pdflink.url)[2]).replace('/system/files/epaper/DZ/pdf/DZ_ePaper_','').replace('.pdf','') zm_url = urlparse(pdflink.base_url)[0]+'://'+urlparse(pdflink.base_url)[1]+''+(urlparse(pdflink.url)[2]).replace('DZ/pdf/DZ_ePaper','ZM/pdf/ZM_ePaper') # TODO: Test for other books that are only published once in a while # (e.g., Die Zeit Beilage) print "Found epub-link: %s" % epublink.url print "Found Mobi-link: %s" % mobilink.url print "Found pdf-link: %s" % pdflink.url print "Found audio-link: %s" % audiolink.url print "Found ZM-link: %s" % zm_url print "This edition is: %s" % edition # The following part is from a recipe by Starsom17 # # It modifies build_index, which is the method that gets the # masthead image and cover, parses the feed for articles, retrieves # the articles, removes tags from articles, etc. All of those steps # ultimately produce a local directory structure that looks like an # unzipped EPUB. # # This part grabs the link to one EPUB, saves the EPUB locally, # extracts it, and passes the result back into the recipe system # as though all the other steps had been completed normally. # # This has to be done, even if one does not want to use this # calibre-modified epub. Otherwise, the recipe runs into an error. # This is the reason why there shows up a second Die Zeit entry # in calibre db. self.report_progress(0,_('downloading epub')) response = browser.follow_link(epublink) # We need two different directories for Die Zeit and Zeit Magazin DZdir = PersistentTemporaryDirectory() ZMdir = PersistentTemporaryDirectory() epub_file = PersistentTemporaryFile(suffix='.epub',dir=DZdir) epub_file.write(response.read()) epub_file.close() zfile = zipfile.ZipFile(epub_file.name, 'r') self.report_progress(0.1,_('extracting epub')) zfile.extractall(self.output_dir) epub_file.close() index = os.path.join(self.output_dir, 'content.opf') self.report_progress(0.2,_('epub downloaded and extracted')) # # Now, download the remaining files # print "output_dir is: %s" % self.output_dir print "DZdir is: %s" % DZdir print "ZMdir is: %s" % ZMdir if (GET_MOBI): self.report_progress(0.3,_('downloading mobi')) mobi_file = PersistentTemporaryFile(suffix='.mobi',dir=DZdir) browser.back() response = browser.follow_link(mobilink) mobi_file.write(response.read()) mobi_file.close() if (GET_PDF): self.report_progress(0.4,_('downloading pdf')) pdf_file = PersistentTemporaryFile(suffix='.pdf',dir=DZdir) browser.back() response = browser.follow_link(pdflink) pdf_file.write(response.read()) pdf_file.close() if (GET_AUDIO): self.report_progress(0.5,_('downloading audio')) audio_file = PersistentTemporaryFile(suffix='.mp3.zip',dir=DZdir) browser.back() response = browser.follow_link(audiolink) audio_file.write(response.read()) audio_file.close() # Get all Die Zeit formats into Calibre's database self.report_progress(0.6,_('Adding Die Zeit to Calibre db')) cmd = "calibredb add -1 " + DZdir os.system(cmd) # Zeit Magazin has to be handled differently. # First, it has to be downloaded into it's own directory, since it # is a different book as Die Zeit. # Second, we know its url rather than its link. # Third, there is no Metadata present, so we need to give it # a proper name so that calibre will set Author and Title at import. # Unfortunately, the present solution includes a random part in the # name which after db import has to be manually resolved by the user. if (GET_MAGAZIN): self.report_progress(0.7,_('downloading ZM')) ZM_file = PersistentTemporaryFile(suffix=' Zeit Magazin '+edition+' - Zeitverlag Gerd Bucerius GmbH und Co. KG.pdf',dir=ZMdir) response = browser.open(zm_url) ZM_file.write(response.read()) ZM_file.close() # Get Zeit Magazin into Calibre's database self.report_progress(0.8,_('Adding Zeit Magazin to Calibre db')) cmd = "calibredb add -1 " + ZMdir os.system(cmd) return index |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Member
![]() Posts: 24
Karma: 12
Join Date: Oct 2011
Device: Xperia Active, Iconia A500, Galaxy I5500
|
Thanks, good idea. I've posted it there now.
|
![]() |
![]() |
![]() |
#4 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
![]() |
![]() |
![]() |
#5 |
Member
![]() Posts: 18
Karma: 36
Join Date: Feb 2011
Device: Kindle
|
Bug report
Hi achims,
First of all, thanks for fixing the old Zeit recipe, I will continue to update and change that one elsewhere (the long thread with all the updates: https://www.mobileread.com/forums/showthread.php?t=90005) since I like the changes that "my" filtering introduces. However, I also really like your new script that downloads the other files. It works fine for me for the Magazin, but when I try to download the PDF of the actual paper I only get the MOBI file even though I disabled all but the PDF: GET_MOBI=False GET_PDF=True GET_AUDIO=False GET_MAGAZIN=False Any suggestions? Cheers, Tobias |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Member
![]() Posts: 24
Karma: 12
Join Date: Oct 2011
Device: Xperia Active, Iconia A500, Galaxy I5500
|
Hi Tobias,
glad to hear that you like my new recipe. Your problem with the pdf download gets me puzzled, though. Especially that it downloads the mobi although GET_MOBI is set to False. Perhaps it has to do with the import to the db. Have you tried to eliminate the entries of the actual ZEIT edition (with all formats) from the calibre db, and then retry the recipe? If there is already one entry in any format, calibredb add might refuse to add new formats and it won't create a new entry, neither. Cheers Achim |
![]() |
![]() |
![]() |
#7 |
Member
![]() Posts: 24
Karma: 12
Join Date: Oct 2011
Device: Xperia Active, Iconia A500, Galaxy I5500
|
New version - includes Zeit Beilage
Hi,
this week's Zeit edition includes an extra giveaway - the Beilage, which only appears every once in while. I have updated the recipe to include the Beilage, if it is offered by the Zeit Premium web page. I have also tweaked a bit on the metadata of the Magazin (same applies to Beilage). Due to a bug in 'calibredb add', which does not process its title and author options, the recipe uses the file name as a fallback workaround. This unfortunately implies the addition of a random string in the title. I changed this now to be at the end of the title where it should disturb less. Note that if you have your calibre's ebook import option set to Code:
(?P<title>.+) - (?P<author>[^_]+) I also updated the recipe such that when this bug is resolved, the random string issue will be solved without further change in the recipe. As a response to Tobias, I have changed the 'calibredb add' system call to include the 'duplicates' option. This way, calling the recipe several times (e.g. with different format switches) will create a new book entry for each call. I hope this resolves his problem. Spoiler:
Have fun Achim |
![]() |
![]() |
![]() |
#8 |
Member
![]() Posts: 24
Karma: 12
Join Date: Oct 2011
Device: Xperia Active, Iconia A500, Galaxy I5500
|
New version - set Metadata to your liking, no system calls
Hi all,
I have an updated version of the ZEIT recipe. These are the changes: - No more system calls to 'calibredb add'. Instead, (modified) internal calibre functions are used. These mods were needed for the next point: - Set metadata to your likings. You can set authors and tags. Zeit Magazin and Beilage now have correct title and author. Have fun Achim Spoiler:
|
![]() |
![]() |
![]() |
#9 |
Member
![]() Posts: 18
Karma: 36
Join Date: Feb 2011
Device: Kindle
|
Hi achims,
I just got around trying your new version, and I still have the effect that despite epub being disabled (False) that the epub is downloaded (and consequently, even a mobi version is created based on the epub). The PDF is now also downloaded, maybe the last time this not happening may indeed be due to me having a copy of the same edition as mobi or epub in Calibre before. Cheers, Tobias |
![]() |
![]() |
![]() |
#10 |
Member
![]() Posts: 24
Karma: 12
Join Date: Oct 2011
Device: Xperia Active, Iconia A500, Galaxy I5500
|
Hi Tobias,
there is no option to disable epub in the recipe. Downloading the epub is an integral part of the recipe, since the epub is used to "cheat" the recipe mechanism. This mechanism takes an unzipped epub and then makes all conversions the user may have specified -- in your case it may create even a mobi file. There is no way to tell calibre not to build an entry with this epub (and mobi) file. The recipe does additionally download the pdf version of the Zeit, the Magazin and the Beilage, and create new database entries for the three logical books. This is the part where one can enable/disable formats. I did not include an option to not include the epub, since it is downloaded anyway, but I could indeed include this option. If all is enabled, the recipe, when called from within the calibre GUI, will therefore create 4 new book entries. The Zeit will have two entries due to the above cheating thing - you will have to delete the doubled entry by hand. When called from the commandline, it will create 3 new book entries and create a new epub file. This epub file is the one resulting from cheating the recipe mechanism. Code:
ebook-convert Zeit\ Premium\ All\ Formats.recipe Zeit.epub --password=PWD --username=USER Cheers Achim |
![]() |
![]() |
![]() |
#11 |
Member
![]() Posts: 24
Karma: 12
Join Date: Oct 2011
Device: Xperia Active, Iconia A500, Galaxy I5500
|
Version update
Hi all,
due to a change of the ZEIT's web layout the recipe had to be adjusted. This is the new version: Spoiler:
Have fun Achim |
![]() |
![]() |
![]() |
#12 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Dec 2011
Device: Kindle
|
thanks for the exhaustive recipe! but - is there a way to have calibre send the original mobi to the kindle instead of the converted epub? I find the TOC of the converted file hideous - you need to press lots of buttons to access it, and it is not nicely split into two columns as news should be...BTW, this seems to be a general issue when converting any TOC, or is it just my configuration that's wrong?
|
![]() |
![]() |
![]() |
#13 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,166
Karma: 1410083
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
|
Had you set in the recipe "GET_MOBI=False" to "GET_MOBI=True"?
|
![]() |
![]() |
![]() |
#14 |
Member
![]() Posts: 24
Karma: 12
Join Date: Oct 2011
Device: Xperia Active, Iconia A500, Galaxy I5500
|
Hi rogerben,
apart from setting GET_MOBI=True in my recipe, as hinted by Divinduck, you might need to change the general calibre preferences. Goto Preferences --> Behavior --> Preferred output format (if you run calibre in german it will be Einstellungen --> Verhalten --> Bevorzugtes Ausgabeformat) and set it to mobi. Also, it might be confusing that my recipe produces two new book entries for the Zeit. One contains all the formats as specified in the recipe's header, in the original version as published by the editor, and in your case it should contain a mobi file. The other book entry contains only an altered epub version -- you should simply delete this book entry. |
![]() |
![]() |
![]() |
#15 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Dec 2011
Device: Kindle
|
Hi,
thanks for your support! I configured as follows: GET_MOBI=True GET_PDF=False GET_AUDIO=False GET_MAGAZIN=True GET_BEILAGE=True And I have set calibre to prefer MOBI output - nevertheless, it seems to send the modified epub. Kind regards, Ben |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Recipe for Zeit Abo EPUB download | siebert | Recipes | 24 | 03-17-2013 09:29 AM |
DIE ZEIT Premium recipe doesn't work anymore | Moik | Recipes | 1 | 07-16-2011 01:46 PM |
Zeit-Online recipe does not work with Sony reader | lesbett | Recipes | 5 | 07-13-2011 11:47 AM |
902 Freeze von Zeit zu Zeit | knorst | PocketBook | 7 | 03-21-2011 05:16 PM |
PB360 display ist von zeit zu zeit streifig | klaetsch | PocketBook | 2 | 01-10-2011 05:24 AM |