Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 12-29-2019, 09:46 AM   #1
Argel
Opinionated [but right]
Argel is no ebook tyro.Argel is no ebook tyro.Argel is no ebook tyro.Argel is no ebook tyro.Argel is no ebook tyro.Argel is no ebook tyro.Argel is no ebook tyro.Argel is no ebook tyro.Argel is no ebook tyro.Argel is no ebook tyro.
 
Argel's Avatar
 
Posts: 276
Karma: 1412
Join Date: Apr 2008
Location: UK
Device: Cybook Gen3, PRS 505, Kindle International, HTC Desire
Updated London Review of Books (subscriber)

OK, here is my amateur reworking of Kovid's latest LRB script.

Changes are:
  • Successfully retrieves specified archive copies. It requires the manual entry of the Volume and Edition number [in 2-digit format] of the desired issue into the script. Getting back-issues was the main object in making the changes.
  • Volume and edition are included in the title for filing purposes.
  • High resolution cover retrieved for archived editions, not the low-res thumbnail from the archive edition front page.
  • Annoying address for letters removed from the end of every article article.
  • Missing author information link re-added to end of articles.

I've had the temerity to add my name to the authors, purely because if anything goes pear-shaped it will undoubtedly be something I've changed and you'll know who to blame.

Desirable changes might include reformatting the article titles in sans but that's a mystery to me.

No warranty as to suitability is offered!

Argel

Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
# License: GPLv3 Copyright: 2019, Kovid Goyal <kovid at kovidgoyal.net>
from calibre.web.feeds.news import BasicNewsRecipe


# Insert correct volume and edition number here
volume_number = '41'
edition_number = '22'
archive_url='https://www.lrb.co.uk/the-paper/v' + volume_number + '/n' + edition_number

def classes(classes):
    q = frozenset(classes.split(' '))
    return dict(attrs={
        'class': lambda x: x and frozenset(x.split()).intersection(q)})


def absolutize(href):
    if href.startswith('/'):
        href =  'https://www.lrb.co.uk' + href
    return href


class LondonReviewOfBooksPayed(BasicNewsRecipe):
    title = 'London Review of Books, Volume ' + volume_number + ', Number ' + edition_number
    __author__ = 'Kovid Goyal, David Lawrence'
    description = 'Literary review publishing essay-length book reviews and topical articles on politics, literature, history, philosophy, science and the arts by leading writers and thinkers'  # noqa
    category = 'news, literature, UK'
    publisher = 'LRB Ltd.'
    language = 'en_GB'
    no_stylesheets = True
    delay = 1
    encoding = 'utf-8'
    INDEX = 'https://www.lrb.co.uk'
    publication_type = 'magazine'
    needs_subscription = True
    requires_version = (3, 0, 0)

    keep_only_tags = [
        classes('article-header--title paperArticle-reviewsHeader article-content article-letters-inner contributor-pane'),
    ]
 
    remove_tags    = [
        classes('social-button article-mask lrb-readmorelink article-send-letter article-share'),
    ]
 
    def get_browser(self):
        br = BasicNewsRecipe.get_browser(self)
        if self.username and self.password:
            br.open('https://www.lrb.co.uk/login')
            br.select_form(id='login_form')
            br['_username'] = self.username
            br['_password'] = self.password
            raw = br.submit().read()
            if b'>My Account<' not in raw:
                raise ValueError('Failed to login check username and password')
        return br

    def parse_index(self):
        articles = []
        soup = self.index_to_soup(archive_url)
        container = soup.find(attrs={'class': 'lrb-content-container'})
        img = container.find('img')
        self.cover_url = img['data-srcset'].split()[-2]
        h3 = container.find('h3')
        self.timefmt = ' [{}]'.format(self.tag_to_string(h3))
        a = img.findParent('a')
        soup = self.index_to_soup(archive_url)
        grid = soup.find(attrs={'class': 'toc-grid-items'})
        articles = []
        for a in grid.findAll(**classes('toc-item')):
            url = absolutize(a['href'])
            h3 = a.find('h3')
            h4 = a.find('h4')
            title = '{}: {}'.format(self.tag_to_string(h3), self.tag_to_string(h4))
            self.log(title, url)
            articles.append({'title': title, 'url': url})

        return [('Articles', articles)]

Last edited by Argel; 12-29-2019 at 11:35 AM.
Argel is offline   Reply With Quote
Old 12-31-2019, 03:50 AM   #2
nano5
Enthusiast
nano5 began at the beginning.
 
Posts: 30
Karma: 10
Join Date: May 2019
Device: Kindle
It works, thanks! A few hours of some good patience, the last decade is about to finish, manually - wondering if there is any process could be improved by automation, other than the recipe-fetch itself.
nano5 is offline   Reply With Quote
Old 12-31-2019, 05:21 AM   #3
nano5
Enthusiast
nano5 began at the beginning.
 
Posts: 30
Karma: 10
Join Date: May 2019
Device: Kindle
A few notices for reference.

(1) Two Editions has fetch error: V28N21, V08N22, missing cover-art;
(2) Double Editions: V06(14/15, 22/23), V03-05(22/23);

(Just the cover art of each edition already reveal the transition of time in the last four decades, will reach 1000 editions mark in the next two years)

Last edited by nano5; 01-01-2020 at 03:50 AM.
nano5 is offline   Reply With Quote
Old 01-17-2020, 04:06 PM   #4
franklekens
Addict
franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.
 
franklekens's Avatar
 
Posts: 276
Karma: 659674
Join Date: Sep 2009
Device: Kobo Forma, Kobo Aura ONE, Kindle Oasis 2, Kindle Keyboard
Thanks.
But for my understanding, because I don't find the news recipe interface very easy to navigate: this is one that has to be added through "add or edit a custom news source"?

And then it enables you to download specific issues, one at a time, if you tweak the parameters?

And the prefab recipe available through "schedule news download" is still broken? Or has that been fixed for the new website as well?
franklekens is offline   Reply With Quote
Old 01-17-2020, 04:27 PM   #5
franklekens
Addict
franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.
 
franklekens's Avatar
 
Posts: 276
Karma: 659674
Join Date: Sep 2009
Device: Kobo Forma, Kobo Aura ONE, Kindle Oasis 2, Kindle Keyboard
Sorry -- to answer my own question: the prefab recipe for the latest issue seems to work again as well.
franklekens is offline   Reply With Quote
Old 01-30-2020, 06:33 AM   #6
bobbysteel
Big Poppa
bobbysteel began at the beginning.
 
Posts: 103
Karma: 10
Join Date: Jul 2010
Device: Nook
Nice work!


Sent from my iPhone using Tapatalk
bobbysteel is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
London Review of Books - Back Issues? ztwig Recipes 3 12-29-2019 10:07 AM
London Review of Books subscriber recipe - new error danceswithcats Recipes 2 12-12-2019 03:41 PM
London Review of Books recipe updated rainrdx Recipes 1 12-25-2012 07:11 PM
London Review of Books - fixed cover URL Frescard Recipes 0 11-05-2012 08:54 PM
London Review of Books Blog JFS-NMF Recipes 0 01-12-2011 03:20 PM


All times are GMT -4. The time now is 02:23 AM.


MobileRead.com is a privately owned, operated and funded community.