Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 01-30-2011, 10:03 AM   #1
winterescape
Enthusiast
winterescape began at the beginning.
 
Posts: 27
Karma: 10
Join Date: Jan 2011
Device: none
Wall Street Journal, WSJ, Free version, recipe improvement for full text of all ar

Wall Street Journal, WSJ, Free version, recipe improvement for full text of all articles

I see the stories that are locked for subscription on the wsj site and understand that those stories are not complete when downloaded.

Googling around I discovered this article on how to retrieve the full story from google without a subscription.
I looked at the recipe for the wsj and this looks like it might be feasible to modify it to retrieve the text of the stories from google.

So the change I would see to the recipe to get this working is :
1) For most stories nothing changes in the recipie
2) when a “key” is included with the story the recipie places the headline into google and retrieves the full story from the link returned

Would anyone with programming background be able to modify the recipe to retrieve all of the WSJ article content using the google link?
winterescape is offline   Reply With Quote
Old 01-31-2011, 02:12 PM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by winterescape View Post
I see the stories that are locked for subscription on the wsj site and understand that those stories are not complete when downloaded.

Googling around I discovered this article on how to retrieve the full story from google without a subscription.
Your linked article is dated Jun. 1, 2009. Have you tried this trick? It doesn't work for me, although that could be because I lock down referrer and UserAgent headers, cookies, scripts and advertising. If it works for you, then it might be possible to do this. Try it and report back.
Starson17 is offline   Reply With Quote
 
Enthusiast
Old 01-31-2011, 06:33 PM   #3
winterescape
Enthusiast
winterescape began at the beginning.
 
Posts: 27
Karma: 10
Join Date: Jan 2011
Device: none
Yes it does.

Today 1/31, if I go to the online wsj , mktplace section, the first story is "Alpha Reaches Deal to Buy Massey" this article shows the key and ends with "Alpha, of Abingdon, Va., had ..." If I just cut and paste the headline into google "Alpha Reaches Deal to Buy Massey" the first link returned by google is this

http://online.wsj.com/article/SB1000...449102880.html

which returns the full story for me.
winterescape is offline   Reply With Quote
Old 01-31-2011, 06:39 PM   #4
winterescape
Enthusiast
winterescape began at the beginning.
 
Posts: 27
Karma: 10
Join Date: Jan 2011
Device: none
wow, very strange. If you click the link I pasted from google to my post it returns only the small portion of the story, however if you paste the headline and click the link from google it works? I do not understand....hmmm I am gussing it has something to do with what you said "I lock down referrer and UserAgent headers, cookies, scripts and advertising"
winterescape is offline   Reply With Quote
Old 01-31-2011, 10:09 PM   #5
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 320
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
WSJ allows access to subscriber-only content if the referer is google news. There is a Firefox extension called RefControl that allows you to set whatever referer you want for a particular website, so if you set it to google news for wsj.com than you will have access to all of its subscriber-only content. Similarly, if you ensure the Calibre's requests to wsj.com have google news as a referer then you'll get all of the articles.
nickredding is offline   Reply With Quote
Old 02-01-2011, 01:00 PM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by nickredding View Post
WSJ allows access to subscriber-only content if the referer is google news. There is a Firefox extension called RefControl that allows you to set whatever referer you want for a particular website, so if you set it to google news for wsj.com than you will have access to all of its subscriber-only content. Similarly, if you ensure the Calibre's requests to wsj.com have google news as a referer then you'll get all of the articles.
I tested with the referer set to google.com, and it still didn't work. Nor would it work from Google's search results. I'm confident it's because of other restrictions I have set, but I didn't want to track down what the problem was unless I was sure that this old trick still worked.
Starson17 is offline   Reply With Quote
Old 02-01-2011, 05:27 PM   #7
winterescape
Enthusiast
winterescape began at the beginning.
 
Posts: 27
Karma: 10
Join Date: Jan 2011
Device: none
Quote:
Originally Posted by Starson17 View Post
I tested with the referer set to google.com, and it still didn't work. Nor would it work from Google's search results. I'm confident it's because of other restrictions I have set, but I didn't want to track down what the problem was unless I was sure that this old trick still worked.
Starson,
If I follow the logic of the responses I now doubt that my proposed method of changing the script is the most efficient solution.

It now sounds like changing the script to emulate the type of request sent from Google should work but you imply this is what you tried and it did not work?
winterescape is offline   Reply With Quote
Old 02-01-2011, 09:11 PM   #8
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 320
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Quote:
Originally Posted by Starson17 View Post
I tested with the referer set to google.com, and it still didn't work. Nor would it work from Google's search results. I'm confident it's because of other restrictions I have set, but I didn't want to track down what the problem was unless I was sure that this old trick still worked.
Set referer to http://news.google.com/news/search?q=wsj and it works unless you have done something to your environment that overrides referer.
nickredding is offline   Reply With Quote
Old 02-01-2011, 09:13 PM   #9
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 320
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Quote:
Originally Posted by winterescape View Post
It now sounds like changing the script to emulate the type of request sent from Google should work but you imply this is what you tried and it did not work?
Emulating the request sent from google means setting the referer.
nickredding is offline   Reply With Quote
Old 02-01-2011, 09:41 PM   #10
winterescape
Enthusiast
winterescape began at the beginning.
 
Posts: 27
Karma: 10
Join Date: Jan 2011
Device: none
Quote:
Originally Posted by nickredding View Post
Set referer to http://news.google.com/news/search?q=wsj and it works unless you have done something to your environment that overrides referer.
nickredding, O.K. great thanks.
How would I Implement this, where is it set, looking at this script I do not see it...
http://bazaar.launchpad.net/~kovid/c...sj_free.recipe
winterescape is offline   Reply With Quote
Old 02-01-2011, 09:58 PM   #11
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 320
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Quote:
Originally Posted by winterescape View Post
nickredding, O.K. great thanks.
How would I Implement this, where is it set, looking at this script I do not see it...
http://bazaar.launchpad.net/~kovid/c...sj_free.recipe
Can't help you there--I haven't been deep enough into the calibre code
nickredding is offline   Reply With Quote
Old 02-01-2011, 10:18 PM   #12
winterescape
Enthusiast
winterescape began at the beginning.
 
Posts: 27
Karma: 10
Join Date: Jan 2011
Device: none
Quote:
Originally Posted by nickredding View Post
if you ensure that Calibre's requests to wsj.com have google news as a referer then you'll get all of the articles.
Fair enough. So to quote where you started last night, If we can get calibre to specifically use http://news.google.com/news/search?q=wsj as the referer it should work.
winterescape is offline   Reply With Quote
Old 02-01-2011, 10:20 PM   #13
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 320
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Well, it works with Firefox, so I can't see why it wouldn't work with calibre.
nickredding is offline   Reply With Quote
Old 02-02-2011, 07:52 AM   #14
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by nickredding View Post
Set referer to http://news.google.com/news/search?q=wsj and it works unless you have done something to your environment that overrides referer.
I'm forging most referers on my system. I don't like tracking. I wasn't implying it won't work, just that it won't work on my system until I allow it, and I didn't want to go to that effort, just for testing. Since it works for you, I don't need to test it further.

As to how to do it in a recipe - Mechanize will forge the referer header. I've done it for other recipes. If you want to look at how it's done, I recall that I did it in the Google Reader authentication recipe, in one of the comic recipes (probably gocomics) and in one of the skeptic recipes I wrote. (Various headers - not always referer.)
Starson17 is offline   Reply With Quote
Old 02-06-2011, 05:53 PM   #15
bcaulf
Junior Member
bcaulf began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Dec 2010
Device: Kindle 3
This is a really good idea. Forging referer gets past the paywalls at economist.com, ft.com, the-american-interest.com, and I'm sure many other publishers of interest to calibre users in addition to wsj.com. Wouldn't it be ideal to add referer forging as a feature that can be turned on for any recipe, with the default set to "on" for those sites where it is known to be helpful?
bcaulf is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Request: recipe for Wall Street Journal Europe jdomingos76 Recipes 1 01-26-2011 09:18 AM
Wall Street Journal dieterpops Sony Reader 0 12-20-2009 05:51 PM
Wall Street Journal still not convinced Argel News 28 12-02-2009 05:48 PM
Wall Street Journal Recipe Quality? rhsanborn Calibre 6 12-04-2008 11:42 PM
Wall Street Journal free for 5 days Colin Dunstan Lounge 0 10-25-2004 08:53 PM


All times are GMT -4. The time now is 06:33 PM.


MobileRead.com is a privately owned, operated and funded community.