Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 12-29-2019, 08:46 AM   #1
Argel
Opinionated [but right]
Argel is no ebook tyro.Argel is no ebook tyro.Argel is no ebook tyro.Argel is no ebook tyro.Argel is no ebook tyro.Argel is no ebook tyro.Argel is no ebook tyro.Argel is no ebook tyro.Argel is no ebook tyro.Argel is no ebook tyro.
 
Argel's Avatar
 
Posts: 281
Karma: 1412
Join Date: Apr 2008
Location: UK
Device: Cybook Gen3, PRS 505, Kindle Int, Oasis, Paperwhite, Scribe
Updated London Review of Books (subscriber)

OK, here is my amateur reworking of Kovid's latest LRB script.

Changes are:
  • Successfully retrieves specified archive copies. It requires the manual entry of the Volume and Edition number [in 2-digit format] of the desired issue into the script. Getting back-issues was the main object in making the changes.
  • Volume and edition are included in the title for filing purposes.
  • High resolution cover retrieved for archived editions, not the low-res thumbnail from the archive edition front page.
  • Annoying address for letters removed from the end of every article article.
  • Missing author information link re-added to end of articles.

I've had the temerity to add my name to the authors, purely because if anything goes pear-shaped it will undoubtedly be something I've changed and you'll know who to blame.

Desirable changes might include reformatting the article titles in sans but that's a mystery to me.

No warranty as to suitability is offered!

Argel

Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
# License: GPLv3 Copyright: 2019, Kovid Goyal <kovid at kovidgoyal.net>
from calibre.web.feeds.news import BasicNewsRecipe


# Insert correct volume and edition number here
volume_number = '41'
edition_number = '22'
archive_url='https://www.lrb.co.uk/the-paper/v' + volume_number + '/n' + edition_number

def classes(classes):
    q = frozenset(classes.split(' '))
    return dict(attrs={
        'class': lambda x: x and frozenset(x.split()).intersection(q)})


def absolutize(href):
    if href.startswith('/'):
        href =  'https://www.lrb.co.uk' + href
    return href


class LondonReviewOfBooksPayed(BasicNewsRecipe):
    title = 'London Review of Books, Volume ' + volume_number + ', Number ' + edition_number
    __author__ = 'Kovid Goyal, David Lawrence'
    description = 'Literary review publishing essay-length book reviews and topical articles on politics, literature, history, philosophy, science and the arts by leading writers and thinkers'  # noqa
    category = 'news, literature, UK'
    publisher = 'LRB Ltd.'
    language = 'en_GB'
    no_stylesheets = True
    delay = 1
    encoding = 'utf-8'
    INDEX = 'https://www.lrb.co.uk'
    publication_type = 'magazine'
    needs_subscription = True
    requires_version = (3, 0, 0)

    keep_only_tags = [
        classes('article-header--title paperArticle-reviewsHeader article-content article-letters-inner contributor-pane'),
    ]
 
    remove_tags    = [
        classes('social-button article-mask lrb-readmorelink article-send-letter article-share'),
    ]
 
    def get_browser(self):
        br = BasicNewsRecipe.get_browser(self)
        if self.username and self.password:
            br.open('https://www.lrb.co.uk/login')
            br.select_form(id='login_form')
            br['_username'] = self.username
            br['_password'] = self.password
            raw = br.submit().read()
            if b'>My Account<' not in raw:
                raise ValueError('Failed to login check username and password')
        return br

    def parse_index(self):
        articles = []
        soup = self.index_to_soup(archive_url)
        container = soup.find(attrs={'class': 'lrb-content-container'})
        img = container.find('img')
        self.cover_url = img['data-srcset'].split()[-2]
        h3 = container.find('h3')
        self.timefmt = ' [{}]'.format(self.tag_to_string(h3))
        a = img.findParent('a')
        soup = self.index_to_soup(archive_url)
        grid = soup.find(attrs={'class': 'toc-grid-items'})
        articles = []
        for a in grid.findAll(**classes('toc-item')):
            url = absolutize(a['href'])
            h3 = a.find('h3')
            h4 = a.find('h4')
            title = '{}: {}'.format(self.tag_to_string(h3), self.tag_to_string(h4))
            self.log(title, url)
            articles.append({'title': title, 'url': url})

        return [('Articles', articles)]

Last edited by Argel; 12-29-2019 at 10:35 AM.
Argel is offline   Reply With Quote
Old 12-31-2019, 02:50 AM   #2
nano5
Zealot
nano5 ought to be getting tired of karma fortunes by now.nano5 ought to be getting tired of karma fortunes by now.nano5 ought to be getting tired of karma fortunes by now.nano5 ought to be getting tired of karma fortunes by now.nano5 ought to be getting tired of karma fortunes by now.nano5 ought to be getting tired of karma fortunes by now.nano5 ought to be getting tired of karma fortunes by now.nano5 ought to be getting tired of karma fortunes by now.nano5 ought to be getting tired of karma fortunes by now.nano5 ought to be getting tired of karma fortunes by now.nano5 ought to be getting tired of karma fortunes by now.
 
Posts: 131
Karma: 2136220
Join Date: May 2019
Device: Kindle
It works, thanks! A few hours of some good patience, the last decade is about to finish, manually - wondering if there is any process could be improved by automation, other than the recipe-fetch itself.
nano5 is offline   Reply With Quote
Advert
Old 12-31-2019, 04:21 AM   #3
nano5
Zealot
nano5 ought to be getting tired of karma fortunes by now.nano5 ought to be getting tired of karma fortunes by now.nano5 ought to be getting tired of karma fortunes by now.nano5 ought to be getting tired of karma fortunes by now.nano5 ought to be getting tired of karma fortunes by now.nano5 ought to be getting tired of karma fortunes by now.nano5 ought to be getting tired of karma fortunes by now.nano5 ought to be getting tired of karma fortunes by now.nano5 ought to be getting tired of karma fortunes by now.nano5 ought to be getting tired of karma fortunes by now.nano5 ought to be getting tired of karma fortunes by now.
 
Posts: 131
Karma: 2136220
Join Date: May 2019
Device: Kindle
A few notices for reference.

(1) Two Editions has fetch error: V28N21, V08N22, missing cover-art;
(2) Double Editions: V06(14/15, 22/23), V03-05(22/23);

(Just the cover art of each edition already reveal the transition of time in the last four decades, will reach 1000 editions mark in the next two years)

Last edited by nano5; 01-01-2020 at 02:50 AM.
nano5 is offline   Reply With Quote
Old 01-17-2020, 03:06 PM   #4
franklekens
Addict
franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.
 
franklekens's Avatar
 
Posts: 398
Karma: 3421956
Join Date: Sep 2009
Device: various Kobo's, Onyx Note2, Pocketbook 360, Kindle Keyboard
Thanks.
But for my understanding, because I don't find the news recipe interface very easy to navigate: this is one that has to be added through "add or edit a custom news source"?

And then it enables you to download specific issues, one at a time, if you tweak the parameters?

And the prefab recipe available through "schedule news download" is still broken? Or has that been fixed for the new website as well?
franklekens is offline   Reply With Quote
Old 01-17-2020, 03:27 PM   #5
franklekens
Addict
franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.
 
franklekens's Avatar
 
Posts: 398
Karma: 3421956
Join Date: Sep 2009
Device: various Kobo's, Onyx Note2, Pocketbook 360, Kindle Keyboard
Sorry -- to answer my own question: the prefab recipe for the latest issue seems to work again as well.
franklekens is offline   Reply With Quote
Advert
Old 01-30-2020, 05:33 AM   #6
bobbysteel
Big Poppa
bobbysteel began at the beginning.
 
Posts: 110
Karma: 10
Join Date: Jul 2010
Device: Nook
Nice work!


Sent from my iPhone using Tapatalk
bobbysteel is offline   Reply With Quote
Old 04-15-2020, 08:18 PM   #7
praimon
Member
praimon began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Oct 2013
Device: none
A significant problem with both this recipe and the prefab one is that they omit all the images: photos, drawings, diagrams, etc. This may not matter for many articles, but it definitely matters for those having to do with the arts, which is a substantial number.

The recipes themselves do what they're supposed to do, so it's the code that extracts the page elements that is failing. Previously this wasn't an issue.

Regards,
praimon
praimon is offline   Reply With Quote
Old 05-18-2020, 10:49 AM   #8
franklekens
Addict
franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.
 
franklekens's Avatar
 
Posts: 398
Karma: 3421956
Join Date: Sep 2009
Device: various Kobo's, Onyx Note2, Pocketbook 360, Kindle Keyboard
Quote:
Originally Posted by praimon View Post
A significant problem with both this recipe and the prefab one is that they omit all the images: photos, drawings, diagrams, etc. This may not matter for many articles, but it definitely matters for those having to do with the arts, which is a substantial number.

The recipes themselves do what they're supposed to do, so it's the code that extracts the page elements that is failing. Previously this wasn't an issue.

Regards,
praimon
This seems to have been fixed at least in the prefab recipe, as far as I can see. The latest volume I downloaded did have images.
franklekens is offline   Reply With Quote
Old 05-18-2020, 10:56 AM   #9
franklekens
Addict
franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.franklekens ought to be getting tired of karma fortunes by now.
 
franklekens's Avatar
 
Posts: 398
Karma: 3421956
Join Date: Sep 2009
Device: various Kobo's, Onyx Note2, Pocketbook 360, Kindle Keyboard
I have another question about this recipe. It seems to work fine. But in order to download a back issue now, what I do is:
1) note down the volume & number
2) go to "add or edit a custom news source"
3) select this recipe and click "edit"
4) fill in the volume and number I had noted down
5) click save
6) click close
7) go to "scheduled news downloads"
8) go to "custom" and select this custom recipe
9) select "download now"
10) click OK or "cancel" to leave this menu.

It works, but there are so many steps. Especially first having to go to one menu to edit the recipe and then go to another one to be able to download that, feels a bit cumbersome, esp. to download only one single issue.
Or am I overlooking something and is there a faster and easier way?
franklekens is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
London Review of Books - Back Issues? ztwig Recipes 3 12-29-2019 09:07 AM
London Review of Books subscriber recipe - new error danceswithcats Recipes 2 12-12-2019 02:41 PM
London Review of Books recipe updated rainrdx Recipes 1 12-25-2012 06:11 PM
London Review of Books - fixed cover URL Frescard Recipes 0 11-05-2012 07:54 PM
London Review of Books Blog JFS-NMF Recipes 0 01-12-2011 02:20 PM


All times are GMT -4. The time now is 05:50 PM.


MobileRead.com is a privately owned, operated and funded community.