09-27-2010, 11:11 AM | #1 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
following a javascript link and table editing
i think this one should be easy, but the documentation on following a java script link is only relevant for a form.
i will explain what i am trying to do with the English sites so people here can understand what i am talking about, but i will change it to Hebrew if it gets done. on this page: http://www.tase.co.il/TASEEng/Market...=5&IndexID=168 i want to press on additional columns. AKA this: Spoiler:
as you see, this link holds not of the attributes that http://bugs.calibre-ebook.com/wiki/recipeGuide_advanced talks about. my google search did not get my any closer. the page that opens in the popup is mainly a table. i want to i want that table to be the recipe. if i could remove the calibre feed index, that would also be good. the problem that i see in the future (i havent gotten that far yet) is that the table will be too wide for the output. but 1st i want to focus on clicking on the javascript link and downloading the table to a file. i think i can do the clean up myself. this is as far as i got. Spoiler:
this gives me 225 pages of HTML code from http://www.tase.co.il/TASEEng/Market...=5&IndexID=168. any thoughts? Last edited by marbs; 09-27-2010 at 11:14 AM. |
09-27-2010, 12:27 PM | #2 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Case 1: I wanted slideshow pics from a javascript slideshow for the Olympics. It turned out that the javascript code included a non-displayed, non-clickable buried URL and IIRC, that URL had data that contained multiple links to the pics in different sizes. I believe I scraped page 1 to get the URL to data page 2 (don't forget to turn on scripts so they aren't stripped as in most recipes), then converted that page to a soup, scraped out the links I needed and assembled the page. Case 2: You can do something similar to the page on login for recipes where you supply login data for a form, then submit the form. Calibre uses Mechanize for that type of work. You can have your recipe set up an internal browser, then tell it to click on any links on a page. If that's the only way to find the data you want, then you go this route. I'm not sure of how much support the Mechanize browser has for various advanced features found in Browsers today. Sometimes you have to figure out how to tell the site your browser doesn't have advanced features (UserAgent header) and hope the site will send you the data you want without too much fuss. Mechanize is powerful, but a bit hard to get a handle on. http://wwwsearch.sourceforge.net/mechanize/doc.html Good luck. |
|
Advert | |
|
09-27-2010, 01:40 PM | #3 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
so i read the link you gave me.
1. i am happy it is not easy. i have been trying for a few days now.
2.i read the link you gave me. i learnd a few things, but there is nothing there about java scripts. am i missing something? 3. could you upload the cases you were talking about? reading them might help me understand what i need to do. |
09-27-2010, 01:45 PM | #4 |
creator of calibre
Posts: 43,863
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
There's no way to follow a javascript link direrctly. Instead what you have to do is grab the request the javascript sends using Tamper Data in Firefox and duplicate that in calibre using mechanize.Request
Alternatively, uses regexps to parse the javascript and figure out the request url from that. |
09-27-2010, 02:06 PM | #5 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
|
||
Advert | |
|
09-27-2010, 02:26 PM | #6 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
http://www.tase.co.il/TASEEng/Manage...s+TA+Composite It also did a GET and passed some cookies. You should be able to replicate what it does with Mechanize, without javascript to pull the data you want. Edit: I see Kovid popped in to say basically the same thing. Last edited by Starson17; 09-27-2010 at 02:29 PM. |
|
09-27-2010, 03:36 PM | #7 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
hey Kovid
thanks for the guest lecture in my master class (i am the student, if anyone missed that.
let me see if i understand. and excuse me if my lingo is not right. i am just thinking out loud. i am trying to get to here. the only problem is that i need to show up with something in my hand. the usual way to get that something is to stop here and get it. now. am i trying to fake it? i think i read something about a header some where else. something about tricking it in to thinking there is an actual link (i think i used in the recipe that i published here at the top) ill go over it again and get back to you guys. thanks ps AFAICT stands for "As Far As I Can Tell". it is not some fierfox add-on Last edited by marbs; 09-27-2010 at 03:40 PM. |
09-27-2010, 03:49 PM | #8 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
I find it's often the case that the site expects you to go to page 1, click a java link or send a form, etc., but you can bypass all of that and just go to the final link to get what you want. Quote:
Bottom line - keep studying the behavior of the site. If it turns out you need special cookies, or referer headers, it will show up in your careful tests. It's possible to get those with Mechanize, if needed. |
||
09-27-2010, 04:06 PM | #9 |
creator of calibre
Posts: 43,863
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
HTTP is a stateless protocol. What that means is that any URL of the type http:// will always work the same no matter it what sequence you visit URLs.
However, since sites like to have sessions and keep track of what users are doing, they send what's called cookies to the users browser. The user's browser stores these cookies and send them back to the site on demand. Some links in a site will not work without the appropriate cookie. mechanize handles cookies transparently. If you think you need to visit URLs in sequence, do so in the calibre recipe and the cookies will work seamlessly. |
09-27-2010, 04:44 PM | #10 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
i didnt see that
so if i do br.open (the 1st site) and then
br.open (the second site), that should work, as far as i can tell. what is happening, is just visiting in the 1st site is enough to let me in to the second. i think i know how to do that. now i have some pythoning to do (but i will have the same type of trouble with my final recipe). this is what i am thinking of doing: Spoiler:
i need to dig a bit here Last edited by marbs; 09-27-2010 at 05:02 PM. |
09-27-2010, 05:57 PM | #11 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
As long as we're covering all the gory details, note that although cookies are handled transparently, the Referer header is not (that's the correct spelling for the referrer header). You have to deal with that manually, if it's needed. (In this case, it seems to not be important.) You can also handle cookies manually, should you need to (I never have) and sometimes you may need to add other headers that are not added by default (Accept headers are sometimes needed to satisfy the Bad Behavior blog plugin). Finally, the ignore robots.txt is turned on in Calibre by default when it uses Mechanize. There's no substitute for a careful analysis of how the site responds and what it needs to give you what you're looking for. It looks like you're on the right road! |
|
09-28-2010, 05:21 AM | #12 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
i am almost there.
i have been working on this for a few hours. i have my table, and i am very happy with it. it fits in to the page in some magical way.
i gave it a fake feed to parse, and just had it return the address that i wanted. i am not sure why it works, but it does. now i want to remove the "feeds" menu that calibre creates (page 2 in any other recipe) and the section menu (page 3 in any other recipe). is there a way to do that? Spoiler:
so i got a little greedy. is there an easy way to brake the table in half? i can think of 3 things that might work (i just dont know how to do them) the 1st is to remove some less relevant columns. the 2nd is to cut every row in half. and have : 1st row right half 1st row left half 2nd row right half and so on. the 3rd is to cut the hole table in half and add hte right most colont to the 2nd half too 1st row right half 2nd row right hald . . . top right cell + 1st row left half 2nd from the top right cell + 2nd row left half . . . possible? Last edited by marbs; 09-28-2010 at 06:29 AM. Reason: i got greedy |
09-28-2010, 11:10 AM | #13 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
|
||
09-28-2010, 11:28 AM | #14 |
creator of calibre
Posts: 43,863
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You can replace the feed menu by a blank page using extra_css but if you actually want it to not be created at all, you will haveto reimplement various functions in BasicNewsRecipe
|
09-28-2010, 04:04 PM | #15 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
i tried playing around with some table code.
added this: Spoiler:
and lost the hole table. i think i will leave it at that for now. Kovid, did you mean adding extra_css = '' to the code? in any case, i am very happy with what i have achieved. i really appreciate the point (or push) in the right direction. i am getting a lot out of the advice instead of just answers. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Using Mobipocket Creator; link to table of contents | Ea | Kindle Formats | 13 | 05-20-2011 04:12 AM |
Anyone know how to convert a pdf table into a table in Word or HTML? | BasilC | Workshop | 7 | 06-25-2010 01:02 AM |
Sideway Table in ePub (Rotate table/text) | Lapiz | ePub | 3 | 01-29-2010 01:11 PM |
Forget coffee table books-- how about a kitchen table book? | ardeegee | Lounge | 10 | 12-02-2009 12:00 PM |
I need Javascript help | Nate the great | Workshop | 4 | 04-04-2009 12:55 AM |