View Single Post
Old 12-28-2010, 01:39 PM   #4
brennydoogles
Junior Member
brennydoogles began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Dec 2010
Device: Kindle 3
Quote:
Originally Posted by kiwidude View Post
I'm happy to give some help if you need it, either send me a PM or post on here for others to also assist.
I would actually love working through it in a thread here if possible. I learn things far more quickly when I have guidance with my first project than if I don't. So here's what I've done so far:

I have written the following methods to input a URL (from allrecipes.com), and return a concise HTML document containing the recipe:
*Note: This script depends on BeautifulSoup.
Code:
import urllib2, sys, string
from BeautifulSoup import BeautifulSoup

def buildHTMLOutput(title, author, ingredientList, directionList):
	output = ""
	titleAuthor = "%s by %s" % (string.capwords(title), string.capwords(author))
	header = "<html><head><title>%s</title></head><body><h2>%s</h2>" % (titleAuthor, titleAuthor)
	ingredients = "<h2>Ingredients:</h2>"
	for ingredient in ingredientList:
		if ingredient != "":
			ingredients += "<li>%s</li>" % (ingredient)
	directions = "<h2>Directions:</h2>"
	for direction in directionList:
		if direction != "":
			directions += "<li>%s</li>" % (direction)
	ingredientSection = "<ul>%s</ul>" % (ingredients)	
	directionSection = "<ol>%s</ol>" % (directions)
	footer = "</body></html>"
	output = "%s\n%s\n%s\n%s" % (header, ingredientSection, directionSection, footer)
	return output

def parseRecipeFromLink(url):
	try:
		ingredientList = []
		directionList = []
		req = urllib2.Request(url)
		response = urllib2.urlopen(req)
		pageText = response.read()
		htmlParser = BeautifulSoup(''.join(pageText))

		ingredientDiv = BeautifulSoup(htmlParser.find('div', "ingredients").prettify())
		ingredients = ingredientDiv.findAll('li')
		for ingredient in ingredients:
			for i in ingredient.findAll(text=True):
				ingredientList.append(i.replace('\n', '').strip())

		directionDiv = BeautifulSoup(htmlParser.find('div', "directions").prettify())	
		directions = directionDiv.findAll('li')
		for step in directions:
			for i in step.findAll(text=True):
				directionList.append(i.replace('\n', '').strip())

		authorDiv = BeautifulSoup(htmlParser.find('div', "author-name").prettify())
		spans = authorDiv.findAll('span')
		for span in spans:
			author =  span.findAll(text=True)[0].replace('\n', '').strip()

		titleArea = BeautifulSoup(htmlParser.find('h1', {"id" : "itemTitle"}).prettify())
		title = titleArea.findAll(text=True)[1].replace('\n', '').strip()
		output = buildHTMLOutput(title, author, ingredientList, directionList)
		soup = BeautifulSoup(output)
		return soup.prettify()
	except Exception, detail: 
		print "Error: ", detail

if __name__ == "__main__":
	if (len(sys.argv) == 1):
		print "No URL was specified"
		quit = 1	
	else:
		if "http://allrecipes.com/" in sys.argv[1]:
			url = sys.argv[1]
		else:
			print "The URL must be from allrecipes.com"
			quit = 1
	if quit != 1:
		print parseRecipeFromLink(url)
So basically, I just need to write the plugin to:
  1. Accept a URL as input from the user
  2. Pass the URL to my script and retrieve the HTML output (possibly into a temp file?)
  3. Convert the HTML into mobi output
  4. Add the recipe to the Library

So first thing's first: Have I missed any crucial steps or is there anything I need to consider before I proceed?
brennydoogles is offline   Reply With Quote