Quote:
Originally Posted by kiwidude
I'm happy to give some help if you need it, either send me a PM or post on here for others to also assist.
|
I would actually love working through it in a thread here if possible. I learn things far more quickly when I have guidance with my first project than if I don't. So here's what I've done so far:
I have written the following methods to input a URL (from allrecipes.com), and return a concise HTML document containing the recipe:
*Note: This script depends on BeautifulSoup.
Code:
import urllib2, sys, string
from BeautifulSoup import BeautifulSoup
def buildHTMLOutput(title, author, ingredientList, directionList):
output = ""
titleAuthor = "%s by %s" % (string.capwords(title), string.capwords(author))
header = "<html><head><title>%s</title></head><body><h2>%s</h2>" % (titleAuthor, titleAuthor)
ingredients = "<h2>Ingredients:</h2>"
for ingredient in ingredientList:
if ingredient != "":
ingredients += "<li>%s</li>" % (ingredient)
directions = "<h2>Directions:</h2>"
for direction in directionList:
if direction != "":
directions += "<li>%s</li>" % (direction)
ingredientSection = "<ul>%s</ul>" % (ingredients)
directionSection = "<ol>%s</ol>" % (directions)
footer = "</body></html>"
output = "%s\n%s\n%s\n%s" % (header, ingredientSection, directionSection, footer)
return output
def parseRecipeFromLink(url):
try:
ingredientList = []
directionList = []
req = urllib2.Request(url)
response = urllib2.urlopen(req)
pageText = response.read()
htmlParser = BeautifulSoup(''.join(pageText))
ingredientDiv = BeautifulSoup(htmlParser.find('div', "ingredients").prettify())
ingredients = ingredientDiv.findAll('li')
for ingredient in ingredients:
for i in ingredient.findAll(text=True):
ingredientList.append(i.replace('\n', '').strip())
directionDiv = BeautifulSoup(htmlParser.find('div', "directions").prettify())
directions = directionDiv.findAll('li')
for step in directions:
for i in step.findAll(text=True):
directionList.append(i.replace('\n', '').strip())
authorDiv = BeautifulSoup(htmlParser.find('div', "author-name").prettify())
spans = authorDiv.findAll('span')
for span in spans:
author = span.findAll(text=True)[0].replace('\n', '').strip()
titleArea = BeautifulSoup(htmlParser.find('h1', {"id" : "itemTitle"}).prettify())
title = titleArea.findAll(text=True)[1].replace('\n', '').strip()
output = buildHTMLOutput(title, author, ingredientList, directionList)
soup = BeautifulSoup(output)
return soup.prettify()
except Exception, detail:
print "Error: ", detail
if __name__ == "__main__":
if (len(sys.argv) == 1):
print "No URL was specified"
quit = 1
else:
if "http://allrecipes.com/" in sys.argv[1]:
url = sys.argv[1]
else:
print "The URL must be from allrecipes.com"
quit = 1
if quit != 1:
print parseRecipeFromLink(url)
So basically, I just need to write the plugin to:
- Accept a URL as input from the user
- Pass the URL to my script and retrieve the HTML output (possibly into a temp file?)
- Convert the HTML into mobi output
- Add the recipe to the Library
So first thing's first: Have I missed any crucial steps or is there anything I need to consider before I proceed?