Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-18-2011, 09:09 PM   #1
PatStapleton
Junior Member
PatStapleton began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Nov 2011
Location: Australia
Device: Kindle 4
Recipe for Medical Journal of Australia

I wasn't sure where to post custom recipes, so I'm posting here.

I wrote a recipe for the Medical Journal of Australia, hopefully this is useful to somebody.

Note that I wrote it to use my institution's login so you may need to adjust this for your own use. The parts to adjust are the "#LOGIN" section, and the "#fix url to pickup institution login by appending" line.

Spoiler:

Code:
__license__   = 'GPL v3'
__copyright__ = '2011, Pat Stapleton <pat.stapleton at gmail.com>'

from calibre.web.feeds import Feed
from calibre.web.feeds.recipes import BasicNewsRecipe
import re

class MJA(BasicNewsRecipe):
    title          = u'MJA'
    description = u'The Medical Journal of Australia'
    category = u'medical, science, health, Australia'
    __author__            = 'Pat Stapleton'
    oldest_article = 14
    max_articles_per_feed = 100
    auto_cleanup = True
    needs_subscription = True
    language              = 'en_AU'
    remove_empty_feeds    = True
    publication_type      = 'journal'
    publisher            = u'Australian Medical Association'
    #masthead_url = 'http://www.mja.com.au/MJAnav.gif'
    
    feeds          = [(u'MJA', u'http://feeds.feedburner.com/TheMedicalJournalOfAustralia?format=xml')]
    
    #LOGIN
    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        br.open('http://ipacez.nd.edu.au/login?url=http://www.mja.com.au')
        br.select_form(nr=0)
        br['user'] = self.username
        br['pass'] = self.password
        br.submit()
        return br
    
    def parse_feeds(self):
        # Do the "official" parse_feeds first
	myFeeds = BasicNewsRecipe.parse_feeds(self)
        
        # Loop thru all articles and compile list of sections
	sectionedArticles = {}
	for curfeed in myFeeds:
            for a, curarticle in enumerate(curfeed.articles):
                #MJA articles have the title format '[SECTION] Article Title' so lets grab section
                section_article = re.split(']', curarticle.title)
                sectionTitle = section_article[0].lstrip('[')
                articleTitle = section_article[1].lstrip()
                if sectionTitle not in sectionedArticles:
                    sectionedArticles[sectionTitle] = []
                
                #cleanup article's title (remove ugly section prefix)
                curarticle.title = articleTitle
                
                #fix url to pickup institution login by appending
                curarticle.url = curarticle.url.replace('http://www.mja.com.au', 'http://www.mja.com.au.ipacez.nd.edu.au')
                
                sectionedArticles[sectionTitle].append(curarticle)
        
        #Create our nice list of sectioned feeds to return
        retFeeds = []
        for section in sectionedArticles:
            newSection = Feed()
            newSection.title = section
            newSection.description = self.description
            newSection.articles = sectionedArticles[section]
            newSection.image_url = None
            retFeeds.append(newSection)
        
        return retFeeds


Enjoy!

-Pat
PatStapleton is offline   Reply With Quote
Old 11-18-2011, 10:07 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,356
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
This is the right place.
kovidgoyal is online now   Reply With Quote
 
Enthusiast
Old 11-18-2011, 10:17 PM   #3
PatStapleton
Junior Member
PatStapleton began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Nov 2011
Location: Australia
Device: Kindle 4
Thanks Kovid .

I just started using Calibre yesterday, and I must say this is a great program! Thanks very much for writing such a useful piece of software!


-Pat
PatStapleton is offline   Reply With Quote
Old 11-18-2011, 10:28 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,356
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You're welcome
kovidgoyal is online now   Reply With Quote
Old 03-24-2012, 07:43 AM   #5
PatStapleton
Junior Member
PatStapleton began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Nov 2011
Location: Australia
Device: Kindle 4
Lightbulb Updated version

Hi everybody, the site has since changed so I've rewritten this recipe so it works again.

Like last time, I've written it to use my institution's login so for your own use you should modify/remove the #LOGIN section and the line:

"url = 'http://www.mja.com.au.ipacez.nd.edu.au' + section.a.get('href')"

should be changed to:

"url = 'http://www.mja.com.au' + section.a.get('href')"

Spoiler:
Code:
__license__   = 'GPL v3'
__copyright__ = '2012, Pat Stapleton <pat.stapleton at gmail.com>'

from calibre.web.feeds import Feed
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
from dateutil import parser
import datetime
import time
import re

class MJA(BasicNewsRecipe):
    title          = u'MJA'
    description = u'The Medical Journal of Australia'
    category = u'medical, science, health, Australia'
    __author__            = 'Pat Stapleton'
    oldest_article = 14
    max_articles_per_feed = 1000
    auto_cleanup = True
    needs_subscription = True
    language              = 'en_AU'
    remove_empty_feeds    = True
    publication_type      = 'journal'
    publisher            = u'Australian Medical Association'
    #masthead_url = 'http://www.mja.com.au/MJAnav.gif'
    issue = None
    
    #LOGIN
    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        br.open('http://ipacez.nd.edu.au/login?url=http://www.mja.com.au')
        br.select_form(nr=0)
        br['user'] = self.username
        br['pass'] = self.password
        br.submit()
        return br
    
    #Get cover image
    def get_cover_url(self):
        retUrl = ''
        rawc = self.index_to_soup('http://www.mja.com.au',True)
        soup = BeautifulSoup(rawc)
        for itimg in soup.findAll('img',src=True):
            if '/cover' in itimg['src']:
                retUrl = 'http://www.mja.com.au/' + itimg['src']
        
        return retUrl

    def parse_index(self):
	#Get link to contents of current issue
	rawc = self.index_to_soup('http://www.mja.com.au',True)
        soup = BeautifulSoup(rawc)
	linkObject = soup.find('div', 'homepage-current-issue-inner')
	link = 'http://www.mja.com.au' + linkObject.contents[0]['href']
	
	feeds = []
	rawc = self.index_to_soup(link, True)
	soup = BeautifulSoup(rawc)

	artCount = 0
	sectionCount = 0
	sectionTitles = soup.findAll('h3')
	sectionTitle = self.tag_to_string(sectionTitles[0])
	articles = []
	for section in soup.findAll(attrs={'class':lambda x: x and 'views-row' in x}):
		artCount = artCount + 1
		classField = section.get('class')
		if(classField.find('views-row-1') != -1): #this is first of new section
			if(articles):
				print(articles)
				feeds.append((sectionTitle, articles))
				articles = []
			if(sectionCount < len(sectionTitles)):
				sectionTitle = self.tag_to_string(sectionTitles[sectionCount])
				sectionCount = sectionCount + 1
			else:
				break
		if(section.p): 
			artTitle = self.tag_to_string(section.p)
		else:
			artTitle = self.tag_to_string(section.a)
		url = 'http://www.mja.com.au.ipacez.nd.edu.au' + section.a.get('href')
		date = ''
		desc = ''
		content = ''
		article = {'title':artTitle, 'url':url, 'date':date, 'description':desc, 'content':content}
		articles.append(article)
	
	if(articles):
		feeds.append((sectionTitle, articles))
		articles = []

	return feeds


Enjoy!

-Pat
PatStapleton is offline   Reply With Quote
Reply

Tags
australia, journal, medical, recipe

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
medical clinics recipe ntsiouris Recipes 5 04-11-2011 09:44 AM
New Journal of Physics recipe chemacortes Recipes 0 01-05-2011 08:08 AM
Poughkeepsie Journal recipe weebl Recipes 0 12-02-2010 08:56 AM
New England of Journal recipe Ebookerr Calibre 1 08-26-2010 04:59 AM
Are medical journal articles in PDF format readable in ILIAD BE? oreoshake iRex 6 03-23-2009 01:18 PM


All times are GMT -4. The time now is 01:49 AM.


MobileRead.com is a privately owned, operated and funded community.