![]() |
#1 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Jan 2013
Device: Kindle
|
Receipe request for Calibre
First of all,
Folks at Calibre, thank you so much for this awesome software. Second, I want to request recipe for Proquest. Proquest hosts tons of online sources, some of which are very expensive otherwise, e.g. Wall Street Journal. Is it possible to obtain wsj from Proquest using Calibre. I have written some code which does that (sort of). But my code uses Selenium (partly since I do not understand the authentication mechanism all that well). Here is my code import time from selenium import webdriver from selenium.webdriver.common.keys import Keys def getnext(): driver.find_element_by_link_text("Next page").click() driver.find_element_by_id("mlcbAll").click() return url='http://search.proquest.com.proxy.xx.edu/publication/10482' driver = webdriver.Firefox() driver.get(url) driver.find_element_by_id("UserIDinput").clear() driver.find_element_by_id("UserIDinput").send_keys ("xxx") driver.find_element_by_css_selector("input[type=\"submit\"]").click() driver.find_element_by_id("passwordInput").clear() driver.find_element_by_id("passwordInput").send_ke ys('xxxxxx') driver.find_element_by_css_selector("input[type=\"submit\"]").click() driver.find_element_by_link_text("View most recent issue").click() driver.find_element_by_id("mlcbAll").click() errorcode=0 while errorcode ==0: try: getnext() except: errorcode=1 driver.find_element_by_id("saveExportLink").click( ) ##Introduce wait here time.sleep(100) el = driver.find_element_by_name("exportMode") for option in el.find_elements_by_tag_name('option'): if option.text == 'HTML': option.click() driver.find_element_by_id("submitButton").click() |
![]() |
![]() |
![]() |
#2 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Jan 2019
Device: kindle paperwhite
|
![]()
Thanks for sharing your script. I updated it somewhat to work with the current version of the site. I have not looked into a Calibre extension, but it's a good idea if it is possible.
Code:
#!/usr/bin/python import time from selenium import webdriver from selenium.webdriver.common.keys import Keys def getnext(): driver.find_element_by_link_text("Next page").click() driver.find_element_by_id("mlcbAll").click() return url='https://search.proquest.com/publication.pubfull:searchmostrecentissue?t:ac=publications_10482' # https://search.proquest.com/publication/54814/ driver = webdriver.Firefox() driver.get(url) time.sleep(3) driver.get(url) ##Introduce wait here time.sleep(5) print("hello") el = driver.find_element_by_name("itemsPerPage") for option in el.find_elements_by_tag_name('option'): if option.text == '100': option.click() driver.find_element_by_id("mlcbAll").click() errorcode=0 while errorcode ==0: try: getnext() except: errorcode=1 driver.find_element_by_id("tsMore").click() driver.find_element_by_id("saveExportLink_1").click() driver.find_elements_by_xpath("//*[contains(@id, 'submitButton')]")[0].click() |
![]() |
![]() |
Advert | |
|
![]() |
Tags |
proquest, receipe, request, wsj |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Baltimore Sun Receipe Fails | tlchost | Recipes | 5 | 01-29-2013 01:38 PM |
Malkin Receipe Fails | tlchost | Recipes | 0 | 01-18-2013 02:22 PM |
minor modified Foreign Affairs receipe | forceps | Recipes | 3 | 03-06-2012 10:43 PM |
New receipe: novinky.cz - czech news portal | latal.tomas | Recipes | 2 | 04-30-2011 09:16 AM |
Receipe Request: Media Guardian | SteveMW | Recipes | 5 | 02-08-2011 02:18 PM |