View Single Post
Old 11-03-2011, 05:34 PM   #16
achims
Member
achims began at the beginning.
 
Posts: 24
Karma: 12
Join Date: Oct 2011
Device: Xperia Active, Iconia A500, Galaxy I5500
Download EBooks in any format from Website

This is building up on "Recipe to download an EPUB from feed" by Starsom17.
You can use it to download all EBooks offered from a News Website, in all formats you like (epub, pdf, mobi, ...).

To see how it works, first take a look at Starsom17's post. His trick is needed to cheat the recipe process so that it gets some epub to work on.

Additionally, this recipe looks for links to other EBook formats, downloads them to a common temporary directory and then applies a system call "calibredb add -1 dir", so that all formats are added to the calibre db as one single logical book.

If there are several logical books to download, you'll need to create a directory and make a system call for each one (or, don't use the -1 option, if there is only one format per book).

Note: I have tested this on Linux and it works fine. Maybe on other OS one has to tweak the system call.

Spoiler:
Code:
import re, zipfile, os
from calibre.ptempfile import PersistentTemporaryDirectory
from calibre.ptempfile import PersistentTemporaryFile
from urlparse import urlparse

GET_MOBI=False
GET_PDF=True

class DownloadAllFormats(BasicNewsRecipe):

    def build_index(self):
        browser = self.get_browser()

        # find the links (Adjust to your needs!)
        epublink = browser.find_link(text_regex=re.compile('.*Download ePub.*'))
        mobilink = browser.find_link(text_regex=re.compile('.*Download Mobi.*'))
        pdflink = browser.find_link(text_regex=re.compile('.*Download PDF.*'))

        # Cheat calibre's recipe method, as in post from Starsom17
        self.report_progress(0,_('downloading epub'))
        response = browser.follow_link(epublink)
        dir = PersistentTemporaryDirectory()
        epub_file = PersistentTemporaryFile(suffix='.epub',dir=dir)
        epub_file.write(response.read())
        epub_file.close()
        zfile = zipfile.ZipFile(epub_file.name, 'r')
        self.report_progress(0.1,_('extracting epub'))
        zfile.extractall(self.output_dir)
        epub_file.close()
        index = os.path.join(self.output_dir, 'content.opf')
        self.report_progress(0.2,_('epub downloaded and extracted'))


        #
        # Now, download the remaining files
        #
        if (GET_MOBI):
           self.report_progress(0.3,_('downloading mobi'))
           mobi_file = PersistentTemporaryFile(suffix='.mobi',dir=dir)
           browser.back()
           response = browser.follow_link(mobilink)
           mobi_file.write(response.read())
           mobi_file.close()

        if (GET_PDF):
           self.report_progress(0.4,_('downloading pdf'))
           pdf_file = PersistentTemporaryFile(suffix='.pdf',dir=dir)
           browser.back()
           response = browser.follow_link(pdflink)
           pdf_file.write(response.read())
           pdf_file.close()

        # Get all formats into Calibre's database as one single book entry
        self.report_progress(0.6,_('Adding files to Calibre db'))
        cmd = "calibredb add -1 " + dir
        os.system(cmd)

        return index
achims is offline   Reply With Quote