View Single Post
Old 09-27-2010, 11:11 AM   #1
marbs
Zealot
marbs began at the beginning.
 
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
following a javascript link and table editing

i think this one should be easy, but the documentation on following a java script link is only relevant for a form.

i will explain what i am trying to do with the English sites so people here can understand what i am talking about, but i will change it to Hebrew if it gets done.

on this page:
http://www.tase.co.il/TASEEng/Market...=5&IndexID=168
i want to press on additional columns. AKA this:
Spoiler:
Code:
<td valign="baseline" nowrap="" align="right">
                                    
                                    &nbsp;&nbsp;&nbsp;<a onclick="javascript:customWindowOpen('/TASEEng/Management/GeneralPages/PopUpGrid.htm?tbl=0&amp;Columns=en-US_AddColColumns&amp;Titles=en-US_AddColTitles&amp;ds=en-US_ds&amp;enumTblType=SharesByIndex&amp;sess=en-US_&amp;gridName=Market+Data+-+Shares+TA+Composite', '_blank', ' resizable=yes,scrollbars=yes', 800, 500);return false;" href="javascript:void(0) ">Additional Columns</a>&nbsp;<img border="0" src="/TASE/Images/English/Grid/expand_arrow.gif">&nbsp;&nbsp;</td>

as you see, this link holds not of the attributes that http://bugs.calibre-ebook.com/wiki/recipeGuide_advanced talks about. my google search did not get my any closer.

the page that opens in the popup is mainly a table. i want to i want that table to be the recipe. if i could remove the calibre feed index, that would also be good.
the problem that i see in the future (i havent gotten that far yet) is that the table will be too wide for the output. but 1st i want to focus on clicking on the javascript link and downloading the table to a file. i think i can do the clean up myself.

this is as far as i got.
Spoiler:
Code:
from calibre.ptempfile import PersistentTemporaryFile
import mechanize
class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title          = u'TA stock table'
    oldest_article = 1
    __author__            = 'marbs'
    max_articles_per_feed = 100
    #no_stylesheets = True
    #extra_css = ' body{font-family: Arial,Helvetica,sans-serif } '
    cover_url      = 'http://money-talks.co.il/wp-content/uploads/2008/02/glasses_on_newspaper.jpg'
    
    
    feeds          = [(u'Breaking News', u'http://tase.co.il/TASEEng/MarketData/Indices/MarketCap/IndexMainDataMarket.htm?Action=6&addTab=0&IndexId=001') 
                     ]
 
    temp_files = []
    articles_are_obfuscated = True

    def get_obfuscated_article(self, url):
        br = self.get_browser()
        br.open(url)
        print_url = 'http://tase.co.il/TASEEng/Management/GeneralPages/PopUpGrid.htm?tbl=0&Columns=en-US_AddColColumns&Titles=en-US_AddColTitles&ds=en-US_ds&enumTblType=SharesByIndex&sess=en-US_&gridName=Market+Data+-+Shares+General'
        response = br.follow_link(mechanize.Link(base_url = '', url = print_url, text = 'Additional Columns', tag = '', attrs = []))
        
        html = response.read()

        self.temp_files.append(PersistentTemporaryFile('_fa.html'))
        self.temp_files[-1].write(html)
        self.temp_files[-1].close()

        return self.temp_files[-1].name


this gives me 225 pages of HTML code from http://www.tase.co.il/TASEEng/Market...=5&IndexID=168. any thoughts?

Last edited by marbs; 09-27-2010 at 11:14 AM.
marbs is offline   Reply With Quote