![]() |
#1 |
Opinionated [but right]
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 281
Karma: 1412
Join Date: Apr 2008
Location: UK
Device: Cybook Gen3, PRS 505, Kindle Int, Oasis, Paperwhite, Scribe
|
Updated London Review of Books (subscriber)
OK, here is my amateur reworking of Kovid's latest LRB script.
Changes are:
I've had the temerity to add my name to the authors, purely because if anything goes pear-shaped it will undoubtedly be something I've changed and you'll know who to blame. Desirable changes might include reformatting the article titles in sans but that's a mystery to me. No warranty as to suitability is offered! Argel Code:
#!/usr/bin/env python2 # vim:fileencoding=utf-8 # License: GPLv3 Copyright: 2019, Kovid Goyal <kovid at kovidgoyal.net> from calibre.web.feeds.news import BasicNewsRecipe # Insert correct volume and edition number here volume_number = '41' edition_number = '22' archive_url='https://www.lrb.co.uk/the-paper/v' + volume_number + '/n' + edition_number def classes(classes): q = frozenset(classes.split(' ')) return dict(attrs={ 'class': lambda x: x and frozenset(x.split()).intersection(q)}) def absolutize(href): if href.startswith('/'): href = 'https://www.lrb.co.uk' + href return href class LondonReviewOfBooksPayed(BasicNewsRecipe): title = 'London Review of Books, Volume ' + volume_number + ', Number ' + edition_number __author__ = 'Kovid Goyal, David Lawrence' description = 'Literary review publishing essay-length book reviews and topical articles on politics, literature, history, philosophy, science and the arts by leading writers and thinkers' # noqa category = 'news, literature, UK' publisher = 'LRB Ltd.' language = 'en_GB' no_stylesheets = True delay = 1 encoding = 'utf-8' INDEX = 'https://www.lrb.co.uk' publication_type = 'magazine' needs_subscription = True requires_version = (3, 0, 0) keep_only_tags = [ classes('article-header--title paperArticle-reviewsHeader article-content article-letters-inner contributor-pane'), ] remove_tags = [ classes('social-button article-mask lrb-readmorelink article-send-letter article-share'), ] def get_browser(self): br = BasicNewsRecipe.get_browser(self) if self.username and self.password: br.open('https://www.lrb.co.uk/login') br.select_form(id='login_form') br['_username'] = self.username br['_password'] = self.password raw = br.submit().read() if b'>My Account<' not in raw: raise ValueError('Failed to login check username and password') return br def parse_index(self): articles = [] soup = self.index_to_soup(archive_url) container = soup.find(attrs={'class': 'lrb-content-container'}) img = container.find('img') self.cover_url = img['data-srcset'].split()[-2] h3 = container.find('h3') self.timefmt = ' [{}]'.format(self.tag_to_string(h3)) a = img.findParent('a') soup = self.index_to_soup(archive_url) grid = soup.find(attrs={'class': 'toc-grid-items'}) articles = [] for a in grid.findAll(**classes('toc-item')): url = absolutize(a['href']) h3 = a.find('h3') h4 = a.find('h4') title = '{}: {}'.format(self.tag_to_string(h3), self.tag_to_string(h4)) self.log(title, url) articles.append({'title': title, 'url': url}) return [('Articles', articles)] Last edited by Argel; 12-29-2019 at 10:35 AM. |
![]() |
![]() |
![]() |
#2 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 131
Karma: 2136220
Join Date: May 2019
Device: Kindle
|
It works, thanks! A few hours of some good patience, the last decade is about to finish, manually - wondering if there is any process could be improved by automation, other than the recipe-fetch itself.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 131
Karma: 2136220
Join Date: May 2019
Device: Kindle
|
A few notices for reference.
(1) Two Editions has fetch error: V28N21, V08N22, missing cover-art; (2) Double Editions: V06(14/15, 22/23), V03-05(22/23); (Just the cover art of each edition already reveal the transition of time in the last four decades, will reach 1000 editions mark in the next two years) Last edited by nano5; 01-01-2020 at 02:50 AM. |
![]() |
![]() |
![]() |
#4 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 398
Karma: 3421956
Join Date: Sep 2009
Device: various Kobo's, Onyx Note2, Pocketbook 360, Kindle Keyboard
|
Thanks.
But for my understanding, because I don't find the news recipe interface very easy to navigate: this is one that has to be added through "add or edit a custom news source"? And then it enables you to download specific issues, one at a time, if you tweak the parameters? And the prefab recipe available through "schedule news download" is still broken? Or has that been fixed for the new website as well? |
![]() |
![]() |
![]() |
#5 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 398
Karma: 3421956
Join Date: Sep 2009
Device: various Kobo's, Onyx Note2, Pocketbook 360, Kindle Keyboard
|
Sorry -- to answer my own question: the prefab recipe for the latest issue seems to work again as well.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Big Poppa
![]() Posts: 110
Karma: 10
Join Date: Jul 2010
Device: Nook
|
Nice work!
Sent from my iPhone using Tapatalk |
![]() |
![]() |
![]() |
#7 |
Member
![]() Posts: 13
Karma: 10
Join Date: Oct 2013
Device: none
|
A significant problem with both this recipe and the prefab one is that they omit all the images: photos, drawings, diagrams, etc. This may not matter for many articles, but it definitely matters for those having to do with the arts, which is a substantial number.
The recipes themselves do what they're supposed to do, so it's the code that extracts the page elements that is failing. Previously this wasn't an issue. Regards, praimon |
![]() |
![]() |
![]() |
#8 | |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 398
Karma: 3421956
Join Date: Sep 2009
Device: various Kobo's, Onyx Note2, Pocketbook 360, Kindle Keyboard
|
Quote:
|
|
![]() |
![]() |
![]() |
#9 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 398
Karma: 3421956
Join Date: Sep 2009
Device: various Kobo's, Onyx Note2, Pocketbook 360, Kindle Keyboard
|
I have another question about this recipe. It seems to work fine. But in order to download a back issue now, what I do is:
1) note down the volume & number 2) go to "add or edit a custom news source" 3) select this recipe and click "edit" 4) fill in the volume and number I had noted down 5) click save 6) click close 7) go to "scheduled news downloads" 8) go to "custom" and select this custom recipe 9) select "download now" 10) click OK or "cancel" to leave this menu. It works, but there are so many steps. Especially first having to go to one menu to edit the recipe and then go to another one to be able to download that, feels a bit cumbersome, esp. to download only one single issue. Or am I overlooking something and is there a faster and easier way? |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
London Review of Books - Back Issues? | ztwig | Recipes | 3 | 12-29-2019 09:07 AM |
London Review of Books subscriber recipe - new error | danceswithcats | Recipes | 2 | 12-12-2019 02:41 PM |
London Review of Books recipe updated | rainrdx | Recipes | 1 | 12-25-2012 06:11 PM |
London Review of Books - fixed cover URL | Frescard | Recipes | 0 | 11-05-2012 07:54 PM |
London Review of Books Blog | JFS-NMF | Recipes | 0 | 01-12-2011 02:20 PM |