Updated London Review of Books (subscriber)

Argel · 12-29-2019, 09:46 AM

OK, here is my amateur reworking of Kovid's latest LRB script.

Changes are:

Successfully retrieves specified archive copies. It requires the manual entry of the Volume and Edition number [in 2-digit format] of the desired issue into the script. Getting back-issues was the main object in making the changes.
Volume and edition are included in the title for filing purposes.
High resolution cover retrieved for archived editions, not the low-res thumbnail from the archive edition front page.
Annoying address for letters removed from the end of every article article.
Missing author information link re-added to end of articles.

I've had the temerity to add my name to the authors, purely because if anything goes pear-shaped it will undoubtedly be something I've changed and you'll know who to blame.

Desirable changes might include reformatting the article titles in sans but that's a mystery to me.

No warranty as to suitability is offered!

Argel

Code:

#!/usr/bin/env python2
# vim:fileencoding=utf-8
# License: GPLv3 Copyright: 2019, Kovid Goyal <kovid at kovidgoyal.net>
from calibre.web.feeds.news import BasicNewsRecipe


# Insert correct volume and edition number here
volume_number = '41'
edition_number = '22'
archive_url='https://www.lrb.co.uk/the-paper/v' + volume_number + '/n' + edition_number

def classes(classes):
    q = frozenset(classes.split(' '))
    return dict(attrs={
        'class': lambda x: x and frozenset(x.split()).intersection(q)})


def absolutize(href):
    if href.startswith('/'):
        href =  'https://www.lrb.co.uk' + href
    return href


class LondonReviewOfBooksPayed(BasicNewsRecipe):
    title = 'London Review of Books, Volume ' + volume_number + ', Number ' + edition_number
    __author__ = 'Kovid Goyal, David Lawrence'
    description = 'Literary review publishing essay-length book reviews and topical articles on politics, literature, history, philosophy, science and the arts by leading writers and thinkers'  # noqa
    category = 'news, literature, UK'
    publisher = 'LRB Ltd.'
    language = 'en_GB'
    no_stylesheets = True
    delay = 1
    encoding = 'utf-8'
    INDEX = 'https://www.lrb.co.uk'
    publication_type = 'magazine'
    needs_subscription = True
    requires_version = (3, 0, 0)

    keep_only_tags = [
        classes('article-header--title paperArticle-reviewsHeader article-content article-letters-inner contributor-pane'),
    ]
 
    remove_tags    = [
        classes('social-button article-mask lrb-readmorelink article-send-letter article-share'),
    ]
 
    def get_browser(self):
        br = BasicNewsRecipe.get_browser(self)
        if self.username and self.password:
            br.open('https://www.lrb.co.uk/login')
            br.select_form(id='login_form')
            br['_username'] = self.username
            br['_password'] = self.password
            raw = br.submit().read()
            if b'>My Account<' not in raw:
                raise ValueError('Failed to login check username and password')
        return br

    def parse_index(self):
        articles = []
        soup = self.index_to_soup(archive_url)
        container = soup.find(attrs={'class': 'lrb-content-container'})
        img = container.find('img')
        self.cover_url = img['data-srcset'].split()[-2]
        h3 = container.find('h3')
        self.timefmt = ' [{}]'.format(self.tag_to_string(h3))
        a = img.findParent('a')
        soup = self.index_to_soup(archive_url)
        grid = soup.find(attrs={'class': 'toc-grid-items'})
        articles = []
        for a in grid.findAll(**classes('toc-item')):
            url = absolutize(a['href'])
            h3 = a.find('h3')
            h4 = a.find('h4')
            title = '{}: {}'.format(self.tag_to_string(h3), self.tag_to_string(h4))
            self.log(title, url)
            articles.append({'title': title, 'url': url})

        return [('Articles', articles)]

nano5 · 12-31-2019, 03:50 AM

It works, thanks! A few hours of some good patience, the last decade is about to finish, manually - wondering if there is any process could be improved by automation, other than the recipe-fetch itself.

nano5 · 12-31-2019, 05:21 AM

A few notices for reference.

(1) Two Editions has fetch error: V28N21, V08N22, missing cover-art;
(2) Double Editions: V06(14/15, 22/23), V03-05(22/23);

(Just the cover art of each edition already reveal the transition of time in the last four decades, will reach 1000 editions mark in the next two years)

franklekens · 01-17-2020, 04:06 PM

Thanks.
But for my understanding, because I don't find the news recipe interface very easy to navigate: this is one that has to be added through "add or edit a custom news source"?

And then it enables you to download specific issues, one at a time, if you tweak the parameters?

And the prefab recipe available through "schedule news download" is still broken? Or has that been fixed for the new website as well?

franklekens · 01-17-2020, 04:27 PM

Sorry -- to answer my own question: the prefab recipe for the latest issue seems to work again as well.

bobbysteel · 01-30-2020, 06:33 AM

Nice work!

Sent from my iPhone using Tapatalk

praimon · 04-15-2020, 09:18 PM

A significant problem with both this recipe and the prefab one is that they omit all the images: photos, drawings, diagrams, etc. This may not matter for many articles, but it definitely matters for those having to do with the arts, which is a substantial number.

The recipes themselves do what they're supposed to do, so it's the code that extracts the page elements that is failing. Previously this wasn't an issue.

Regards,
praimon

franklekens · 05-18-2020, 11:49 AM

Quote:

Originally Posted by praimon

A significant problem with both this recipe and the prefab one is that they omit all the images: photos, drawings, diagrams, etc. This may not matter for many articles, but it definitely matters for those having to do with the arts, which is a substantial number.

The recipes themselves do what they're supposed to do, so it's the code that extracts the page elements that is failing. Previously this wasn't an issue.

Regards,
praimon

This seems to have been fixed at least in the prefab recipe, as far as I can see. The latest volume I downloaded did have images.

franklekens · 05-18-2020, 11:56 AM

I have another question about this recipe. It seems to work fine. But in order to download a back issue now, what I do is:
1) note down the volume & number
2) go to "add or edit a custom news source"
3) select this recipe and click "edit"
4) fill in the volume and number I had noted down
5) click save
6) click close
7) go to "scheduled news downloads"
8) go to "custom" and select this custom recipe
9) select "download now"
10) click OK or "cancel" to leave this menu.

It works, but there are so many steps. Especially first having to go to one menu to edit the recipe and then go to another one to be able to download that, feels a bit cumbersome, esp. to download only one single issue.
Or am I overlooking something and is there a faster and easier way?

12-31-2019, 05:21 AM	#3
nano5 Zealot Posts: 131 Karma: 2136220 Join Date: May 2019 Device: Kindle	A few notices for reference. (1) Two Editions has fetch error: V28N21, V08N22, missing cover-art; (2) Double Editions: V06(14/15, 22/23), V03-05(22/23); (Just the cover art of each edition already reveal the transition of time in the last four decades, will reach 1000 editions mark in the next two years) Last edited by nano5; 01-01-2020 at 03:50 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
London Review of Books - Back Issues?	ztwig	Recipes	3	12-29-2019 10:07 AM
London Review of Books subscriber recipe - new error	danceswithcats	Recipes	2	12-12-2019 03:41 PM
London Review of Books recipe updated	rainrdx	Recipes	1	12-25-2012 07:11 PM
London Review of Books - fixed cover URL	Frescard	Recipes	0	11-05-2012 08:54 PM
London Review of Books Blog	JFS-NMF	Recipes	0	01-12-2011 03:20 PM

12-31-2019, 03:50 AM	#2
nano5 Zealot Posts: 131 Karma: 2136220 Join Date: May 2019 Device: Kindle	It works, thanks! A few hours of some good patience, the last decade is about to finish, manually - wondering if there is any process could be improved by automation, other than the recipe-fetch itself.

01-17-2020, 04:06 PM	#4
franklekens Evangelist Posts: 407 Karma: 3421956 Join Date: Sep 2009 Device: various Kobo's, Onyx Note2, Pocketbook 360, Kindle Keyboard	Thanks. But for my understanding, because I don't find the news recipe interface very easy to navigate: this is one that has to be added through "add or edit a custom news source"? And then it enables you to download specific issues, one at a time, if you tweak the parameters? And the prefab recipe available through "schedule news download" is still broken? Or has that been fixed for the new website as well?

01-17-2020, 04:27 PM	#5
franklekens Evangelist Posts: 407 Karma: 3421956 Join Date: Sep 2009 Device: various Kobo's, Onyx Note2, Pocketbook 360, Kindle Keyboard	Sorry -- to answer my own question: the prefab recipe for the latest issue seems to work again as well.

01-30-2020, 06:33 AM	#6
bobbysteel Big Poppa Posts: 110 Karma: 10 Join Date: Jul 2010 Device: Nook	Nice work! Sent from my iPhone using Tapatalk

04-15-2020, 09:18 PM	#7
praimon Member Posts: 13 Karma: 10 Join Date: Oct 2013 Device: none	A significant problem with both this recipe and the prefab one is that they omit all the images: photos, drawings, diagrams, etc. This may not matter for many articles, but it definitely matters for those having to do with the arts, which is a substantial number. The recipes themselves do what they're supposed to do, so it's the code that extracts the page elements that is failing. Previously this wasn't an issue. Regards, praimon

05-18-2020, 11:56 AM	#9
franklekens Evangelist Posts: 407 Karma: 3421956 Join Date: Sep 2009 Device: various Kobo's, Onyx Note2, Pocketbook 360, Kindle Keyboard	I have another question about this recipe. It seems to work fine. But in order to download a back issue now, what I do is: 1) note down the volume & number 2) go to "add or edit a custom news source" 3) select this recipe and click "edit" 4) fill in the volume and number I had noted down 5) click save 6) click close 7) go to "scheduled news downloads" 8) go to "custom" and select this custom recipe 9) select "download now" 10) click OK or "cancel" to leave this menu. It works, but there are so many steps. Especially first having to go to one menu to edit the recipe and then go to another one to be able to download that, feels a bit cumbersome, esp. to download only one single issue. Or am I overlooking something and is there a faster and easier way?

Advert

Advert