View Single Post
Old 09-27-2010, 12:27 PM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by marbs View Post
i think this one should be easy, but the documentation on following a java script link is only relevant for a form.... any thoughts?
First thought. It's far from easy. It's advanced recipe writing. There are several approaches. The easiest depends on the specifics of your site. Exploring the source for your page, trying to see where the data can be obtained, etc. is crucial. For example:

Case 1: I wanted slideshow pics from a javascript slideshow for the Olympics. It turned out that the javascript code included a non-displayed, non-clickable buried URL and IIRC, that URL had data that contained multiple links to the pics in different sizes. I believe I scraped page 1 to get the URL to data page 2 (don't forget to turn on scripts so they aren't stripped as in most recipes), then converted that page to a soup, scraped out the links I needed and assembled the page.

Case 2: You can do something similar to the page on login for recipes where you supply login data for a form, then submit the form. Calibre uses Mechanize for that type of work. You can have your recipe set up an internal browser, then tell it to click on any links on a page. If that's the only way to find the data you want, then you go this route. I'm not sure of how much support the Mechanize browser has for various advanced features found in Browsers today. Sometimes you have to figure out how to tell the site your browser doesn't have advanced features (UserAgent header) and hope the site will send you the data you want without too much fuss.

Mechanize is powerful, but a bit hard to get a handle on.
http://wwwsearch.sourceforge.net/mechanize/doc.html

Good luck.
Starson17 is offline   Reply With Quote