Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-26-2012, 07:28 PM   #1
tzwenn
Junior Member
tzwenn began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2012
Device: Onyx Boox M92
New recipe - calibre adds it two times

Hi,
I wrote a recipe to download news from jungewelt.de.
junge Welt already provides the current newspaper in epub and pdf. So I decided to use the re-useable code from achims http://www.mobileread.com/forums/sho...1&postcount=16

This is what I have:
Spoiler:

Code:
#!/usr/bin/env  python
# -*- coding: utf-8 -*-

__license__   = 'GPL v3'
__copyright__ = '2012, Sven Dziadek sven . dziadek at gmx . de'
__docformat__ = 'restructuredtext de'

GET_MOBI=False
GET_PDF=True

'''
https://www.jungewelt.de/abo/onlineabo.php
'''
import os, urllib2, zipfile, sys
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile
from calibre.ptempfile import PersistentTemporaryDirectory
from calibre.ptempfile import PersistentTemporaryFile
from urlparse import urlparse


class TazDigiabo(BasicNewsRecipe):

    title = u'junge Welt Onlineabo'
    description = u'Das ePub Onlineabo der jungen Welt'
    language = 'de'
    lang = 'de-DE'

    __author__ = 'Sven Dziadek'
    needs_subscription = True

    conversion_options = {
        'no_default_epub_cover' : True
    }

    def build_index(self):
        browser = self.get_browser()
        

        # new login process
        # must be done here so that browser is already at a website
        response = browser.open('https://www.jungewelt.de/loginFailed.php')
        browser.select_form(nr=1)
        browser.form['username'] = self.username
        browser.form['password'] = self.password
        browser.submit()
        # now find the correct file, we will still use the ePub file
        epublink = browser.find_link(text_regex=re.compile('.*Downloads*'))
        response = browser.follow_link(epublink)
        epublink = browser.find_link(text_regex=re.compile('.*ePub-Datei*'))
        response = browser.follow_link(epublink)


        # Cheat calibre's recipe method, as in post from Starsom17
        self.report_progress(0,_('downloading epub'))

        dir = PersistentTemporaryDirectory()
        epub_file = PersistentTemporaryFile(suffix='.epub',dir=dir)
        epub_file.write(response.read())
        epub_file.close()
        zfile = zipfile.ZipFile(epub_file.name, 'r')
        self.report_progress(0.1,_('extracting epub'))
        zfile.extractall(self.output_dir)
        epub_file.close()
        #index = []
        index = os.path.join(self.output_dir, 'content.opf')
        self.report_progress(0.2,_('epub downloaded and extracted'))


        #
        # Now, download the remaining files
        #
        if (GET_MOBI):
           self.report_progress(0.3,_('downloading mobi'))
           mobi_file = PersistentTemporaryFile(suffix='.mobi',dir=dir)
           browser.back()
           response = browser.follow_link(mobilink)
           mobi_file.write(response.read())
           mobi_file.close()

        if (GET_PDF):
           self.report_progress(0.4,_('downloading pdf'))
           pdf_file = PersistentTemporaryFile(suffix='.pdf',dir=dir)
           browser.back()
           pdflink = browser.find_link(text_regex=re.compile('.*PDF-Datei*'))
           response = browser.follow_link(pdflink)
           pdf_file.write(response.read())
           pdf_file.close()
           
        # Get all formats into Calibre's database as one single book entry
        self.report_progress(0.6,_('Adding files to Calibre db'))
        cmd = "calibredb add -1 " + dir
        os.system(cmd)
        #sys.exit(0)

        return index


As achims suggested, I am adding the two files to the database myself so that the two books appear as one book in calibre with different formats. (calibredb add -1 does it.)
Additionionally calibre assembles the unzipped epub again.
But when I already added the epub I don't need that calibre adds it again..

So again in short:
At the moment when I use the plugin, I get two books in calibre, one with the assembled epub and another book with the original epub and the pdf in it.

Can I change that somehow?

Except for that it is ready to be used.

Thanks
tzwenn is offline   Reply With Quote
Old 11-26-2012, 09:41 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,374
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The only way to prevent that is to abort the recipe. Raise a exception at the end of the build_index() method.
kovidgoyal is offline   Reply With Quote
Old 11-27-2012, 03:53 AM   #3
tzwenn
Junior Member
tzwenn began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2012
Device: Onyx Boox M92
Ok,
then it looks like that:
Spoiler:
Code:
#!/usr/bin/env  python
# -*- coding: utf-8 -*-

__license__   = 'GPL v3'
__copyright__ = '2012, Sven Dziadek sven . dziadek at gmx . de'
__docformat__ = 'restructuredtext de'

GET_MOBI=False
GET_PDF=True

'''
https://www.jungewelt.de/abo/onlineabo.php
'''
import os, urllib2, zipfile, sys
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile
from calibre.ptempfile import PersistentTemporaryDirectory
from calibre.ptempfile import PersistentTemporaryFile
from urlparse import urlparse


class jungeWeltOnlineAbo(BasicNewsRecipe):

    title = u'junge Welt Onlineabo'
    description = u'Das ePub Onlineabo der jungen Welt'
    language = 'de'
    lang = 'de-DE'

    __author__ = 'Sven Dziadek'
    needs_subscription = True

    def build_index(self):
        browser = self.get_browser()
        
        # new login process
        # must be done here so that browser is already at a website
        response = browser.open('https://www.jungewelt.de/loginFailed.php')
        browser.select_form(nr=1)
        browser.form['username'] = self.username
        browser.form['password'] = self.password
        browser.submit()
        # now find the correct file, we will still use the ePub file
        epublink = browser.find_link(text_regex=re.compile('.*Downloads*'))
        response = browser.follow_link(epublink)
        epublink = browser.find_link(text_regex=re.compile('.*ePub-Datei*'))
        response = browser.follow_link(epublink)

        # Cheat calibre's recipe method, as in post from Starsom17
        self.report_progress(0,_('downloading epub'))

        dir = PersistentTemporaryDirectory()
        epub_file = PersistentTemporaryFile(suffix='.epub',dir=dir)
        epub_file.write(response.read())
        epub_file.close()
        zfile = zipfile.ZipFile(epub_file.name, 'r')
        self.report_progress(0.1,_('extracting epub'))
        zfile.extractall(self.output_dir)
        epub_file.close()
        index = os.path.join(self.output_dir, 'content.opf')
        self.report_progress(0.2,_('epub downloaded and extracted'))

        #
        # Now, download the remaining files
        #
        if (GET_MOBI):
           self.report_progress(0.3,_('downloading mobi'))
           mobi_file = PersistentTemporaryFile(suffix='.mobi',dir=dir)
           browser.back()
           response = browser.follow_link(mobilink)
           mobi_file.write(response.read())
           mobi_file.close()

        if (GET_PDF):
           self.report_progress(0.4,_('downloading pdf'))
           pdf_file = PersistentTemporaryFile(suffix='.pdf',dir=dir)
           browser.back()
           pdflink = browser.find_link(text_regex=re.compile('.*PDF-Datei*'))
           response = browser.follow_link(pdflink)
           pdf_file.write(response.read())
           pdf_file.close()
           
        # Get all formats into Calibre's database as one single book entry
        self.report_progress(0.6,_('Adding files to Calibre db'))
        cmd = "calibredb add -1 " + dir
        os.system(cmd)
        raise Exception('There is no exception! Everything works fine.')

        return index


But that is really ugly. Now it exits with a big FAILED popup. No user will understand that. And to read my exception message you have to scroll down to the end...

Is there no better solution?
tzwenn is offline   Reply With Quote
Old 11-28-2012, 05:59 PM   #4
tzwenn
Junior Member
tzwenn began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2012
Device: Onyx Boox M92
Instead of throwing an error, I decided to use sys.exit(). Then the plugin did not fail but exits normally.
Spoiler:
Code:
#!/usr/bin/env  python
# -*- coding: utf-8 -*-

__license__   = 'GPL v3'
__copyright__ = '2012, Sven Dziadek sven . dziadek at gmx . de'
__docformat__ = 'restructuredtext de'

GET_MOBI=False
GET_PDF=True

'''
https://www.jungewelt.de/abo/onlineabo.php
'''
import os, urllib2, zipfile, sys
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile
from calibre.ptempfile import PersistentTemporaryDirectory
from calibre.ptempfile import PersistentTemporaryFile
from urlparse import urlparse


class jungeWeltOnlineAbo(BasicNewsRecipe):

    title = u'junge Welt Onlineabo'
    description = u'Das ePub Onlineabo der jungen Welt'
    language = 'de'
    lang = 'de-DE'

    __author__ = 'Sven Dziadek'
    needs_subscription = True

    conversion_options = {
        'no_default_epub_cover' : True
    }


    # based on achims' code on http://www.mobileread.com/forums/showpost.php?p=1816751&postcount=16
    def build_index(self):
        browser = self.get_browser()
        

        # new login process
        # must be done here so that browser is already at a right website
        response = browser.open('https://www.jungewelt.de/loginFailed.php')
        browser.select_form(nr=1)
        browser.form['username'] = self.username
        browser.form['password'] = self.password
        browser.submit()
        # now find the correct file, we will still use the ePub file
        epublink = browser.find_link(text_regex=re.compile('.*Downloads*'))
        response = browser.follow_link(epublink)
        epublink = browser.find_link(text_regex=re.compile('.*ePub-Datei*'))
        response = browser.follow_link(epublink)


        # Cheat calibre's recipe method, as in post from Starsom17
        self.report_progress(0,_('downloading epub'))

        dir = PersistentTemporaryDirectory()
        epub_file = PersistentTemporaryFile(suffix='.epub',dir=dir)
        epub_file.write(response.read())
        epub_file.close()
        zfile = zipfile.ZipFile(epub_file.name, 'r')
        self.report_progress(0.1,_('extracting epub'))
        zfile.extractall(self.output_dir)
        epub_file.close()
        index = os.path.join(self.output_dir, 'content.opf')
        self.report_progress(0.2,_('epub downloaded and extracted'))


        #
        # Now, download the remaining files
        #
        if (GET_MOBI):
           self.report_progress(0.3,_('downloading mobi'))
           mobi_file = PersistentTemporaryFile(suffix='.mobi',dir=dir)
           browser.back()
           response = browser.follow_link(mobilink)
           mobi_file.write(response.read())
           mobi_file.close()

        if (GET_PDF):
           self.report_progress(0.4,_('downloading pdf'))
           pdf_file = PersistentTemporaryFile(suffix='.pdf',dir=dir)
           browser.back()
           pdflink = browser.find_link(text_regex=re.compile('.*PDF-Datei*'))
           response = browser.follow_link(pdflink)
           pdf_file.write(response.read())
           pdf_file.close()
           
        # Get all formats into Calibre's database as one single book entry
        self.report_progress(0.6,_('Adding files to Calibre db'))
        cmd = "calibredb add -1 " + dir
        os.system(cmd)
        sys.exit(0)

        return index


Unfortunately something or someone adds another news ebook with the name "XXXXXX recipe out". The x are always different letters. The epub in it is invalid as it is empty (0byte).

Maybe it is easier to change that?
tzwenn is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
The Times UK Recipe working for you ? RichardN Recipes 3 08-20-2012 07:47 PM
Request: small recipe that adds borders to a borderless table inside an RSS feed mopol Recipes 0 03-01-2012 03:26 PM
Patch: Calibre adds tags to identify ebook formats created by calibre. siebert Calibre 1 07-18-2011 02:07 PM
NY Times Recipe Changes bcollier Recipes 1 03-04-2011 11:52 AM
NY Times Recipe in Calibre 6.36 Fails keyrunner Calibre 1 01-28-2010 11:56 AM


All times are GMT -4. The time now is 04:13 PM.


MobileRead.com is a privately owned, operated and funded community.