Quote:
Originally Posted by marbs
i think this one should be easy, but the documentation on following a java script link is only relevant for a form.... any thoughts?
|
First thought. It's far from easy. It's advanced recipe writing. There are several approaches. The easiest depends on the specifics of your site. Exploring the source for your page, trying to see where the data can be obtained, etc. is crucial. For example:
Case 1: I wanted slideshow pics from a javascript slideshow for the Olympics. It turned out that the javascript code included a non-displayed, non-clickable buried URL and IIRC, that URL had data that contained multiple links to the pics in different sizes. I believe I scraped page 1 to get the URL to data page 2 (don't forget to turn on scripts so they aren't stripped as in most recipes), then converted that page to a soup, scraped out the links I needed and assembled the page.
Case 2: You can do something similar to the page on login for recipes where you supply login data for a form, then submit the form. Calibre uses Mechanize for that type of work. You can have your recipe set up an internal browser, then tell it to click on any links on a page. If that's the only way to find the data you want, then you go this route. I'm not sure of how much support the Mechanize browser has for various advanced features found in Browsers today. Sometimes you have to figure out how to tell the site your browser doesn't have advanced features (UserAgent header) and hope the site will send you the data you want without too much fuss.
Mechanize is powerful, but a bit hard to get a handle on.
http://wwwsearch.sourceforge.net/mechanize/doc.html
Good luck.