|
|
Thread Tools | Search this Thread |
01-30-2011, 10:03 AM | #1 |
Enthusiast
Posts: 27
Karma: 10
Join Date: Jan 2011
Device: none
|
Wall Street Journal, WSJ, Free version, recipe improvement for full text of all ar
Wall Street Journal, WSJ, Free version, recipe improvement for full text of all articles
I see the stories that are locked for subscription on the wsj site and understand that those stories are not complete when downloaded. Googling around I discovered this article on how to retrieve the full story from google without a subscription. I looked at the recipe for the wsj and this looks like it might be feasible to modify it to retrieve the text of the stories from google. So the change I would see to the recipe to get this working is : 1) For most stories nothing changes in the recipie 2) when a “key” is included with the story the recipie places the headline into google and retrieves the full story from the link returned Would anyone with programming background be able to modify the recipe to retrieve all of the WSJ article content using the google link? |
01-31-2011, 02:12 PM | #2 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
|
|
Advert | |
|
01-31-2011, 06:33 PM | #3 |
Enthusiast
Posts: 27
Karma: 10
Join Date: Jan 2011
Device: none
|
Yes it does.
Today 1/31, if I go to the online wsj , mktplace section, the first story is "Alpha Reaches Deal to Buy Massey" this article shows the key and ends with "Alpha, of Abingdon, Va., had ..." If I just cut and paste the headline into google "Alpha Reaches Deal to Buy Massey" the first link returned by google is this http://online.wsj.com/article/SB1000...449102880.html which returns the full story for me. |
01-31-2011, 06:39 PM | #4 |
Enthusiast
Posts: 27
Karma: 10
Join Date: Jan 2011
Device: none
|
wow, very strange. If you click the link I pasted from google to my post it returns only the small portion of the story, however if you paste the headline and click the link from google it works? I do not understand....hmmm I am gussing it has something to do with what you said "I lock down referrer and UserAgent headers, cookies, scripts and advertising"
|
01-31-2011, 10:09 PM | #5 |
onlinenewsreader.net
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
WSJ allows access to subscriber-only content if the referer is google news. There is a Firefox extension called RefControl that allows you to set whatever referer you want for a particular website, so if you set it to google news for wsj.com than you will have access to all of its subscriber-only content. Similarly, if you ensure the Calibre's requests to wsj.com have google news as a referer then you'll get all of the articles.
|
Advert | |
|
02-01-2011, 01:00 PM | #6 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
|
|
02-01-2011, 05:27 PM | #7 | |
Enthusiast
Posts: 27
Karma: 10
Join Date: Jan 2011
Device: none
|
Quote:
If I follow the logic of the responses I now doubt that my proposed method of changing the script is the most efficient solution. It now sounds like changing the script to emulate the type of request sent from Google should work but you imply this is what you tried and it did not work? |
|
02-01-2011, 09:11 PM | #8 | |
onlinenewsreader.net
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
Quote:
|
|
02-01-2011, 09:13 PM | #9 |
onlinenewsreader.net
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
|
02-01-2011, 09:41 PM | #10 | |
Enthusiast
Posts: 27
Karma: 10
Join Date: Jan 2011
Device: none
|
Quote:
How would I Implement this, where is it set, looking at this script I do not see it... http://bazaar.launchpad.net/~kovid/c...sj_free.recipe |
|
02-01-2011, 09:58 PM | #11 | |
onlinenewsreader.net
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
Quote:
|
|
02-01-2011, 10:18 PM | #12 | |
Enthusiast
Posts: 27
Karma: 10
Join Date: Jan 2011
Device: none
|
Quote:
|
|
02-01-2011, 10:20 PM | #13 |
onlinenewsreader.net
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
Well, it works with Firefox, so I can't see why it wouldn't work with calibre.
|
02-02-2011, 07:52 AM | #14 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
As to how to do it in a recipe - Mechanize will forge the referer header. I've done it for other recipes. If you want to look at how it's done, I recall that I did it in the Google Reader authentication recipe, in one of the comic recipes (probably gocomics) and in one of the skeptic recipes I wrote. (Various headers - not always referer.) |
|
02-06-2011, 05:53 PM | #15 |
Junior Member
Posts: 4
Karma: 10
Join Date: Dec 2010
Device: Kindle 3
|
This is a really good idea. Forging referer gets past the paywalls at economist.com, ft.com, the-american-interest.com, and I'm sure many other publishers of interest to calibre users in addition to wsj.com. Wouldn't it be ideal to add referer forging as a feature that can be turned on for any recipe, with the default set to "on" for those sites where it is known to be helpful?
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Request: recipe for Wall Street Journal Europe | jdomingos76 | Recipes | 1 | 01-26-2011 09:18 AM |
Wall Street Journal | dieterpops | Sony Reader | 0 | 12-20-2009 05:51 PM |
Wall Street Journal still not convinced | Argel | News | 28 | 12-02-2009 05:48 PM |
Wall Street Journal Recipe Quality? | rhsanborn | Calibre | 6 | 12-04-2008 11:42 PM |
Wall Street Journal free for 5 days | Colin Dunstan | Lounge | 0 | 10-25-2004 08:53 PM |