11-24-2016, 12:20 PM | #1 |
Zealot
Posts: 131
Karma: 150390
Join Date: Nov 2011
Location: Pacific NorthWest
Device: Kindle Fire
|
New/Updated: WSJ Wall Street Journal recipe
Attached is a modified Wall Street Journal recipe. This retrieves the article text without requiring an WSJ account. (That probably means it replaces wsj_free.recipe on github.)
|
11-24-2016, 09:37 PM | #2 |
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Bypassing the paywall by pretending to be the googlebot is IMO both fragile and ethically dubious.
So I think I will not update the builtin recipe with those changes, you are of course welcome to keep using your customized recipes. |
11-25-2016, 12:22 AM | #3 |
Zealot
Posts: 131
Karma: 150390
Join Date: Nov 2011
Location: Pacific NorthWest
Device: Kindle Fire
|
It's your product, so it's your call, but I don't agree with either statement. It's not fragile; it would take them a while to negotiate and implement a new behavior in common with Google. And I don't consider it an ethical connundrum either, given that it's all documented and is merely setting a User-agent.
But, as I said, it's your product and your call. Last edited by TechnoCat; 11-25-2016 at 12:23 AM. Reason: Missed a critical word |
11-25-2016, 02:18 AM | #4 |
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The ethical dubiousness comes from the fact that if everyone starts spoofing search engine crawlers to bypass paywalls, then the maintainers of those paywalls will have to close that loophole, which means their content is no longer generally indexable, which is a net loss for everyone.
And it is fragile because replacing it by a secured protocol is trivial, for example using TLS client verification. All google would have to do is turn it on in their crawler and publish their public certificate. And this in turn would make it much harder for a google competitor to ever emerge, since now every paywall would have to actively add a new public certificate to its configuration to allow access to a new crawler. |
11-25-2016, 10:17 AM | #5 | |
Guru
Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
Quote:
|
|
Tags |
wsj |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Updated Wall Street Journal recipe creates huge file | BobbyVan | Recipes | 10 | 10-06-2015 09:01 AM |
Req - Wall Street Journal (WSJ) USA Premium Recipe | CrazyWorld | Recipes | 2 | 07-05-2015 10:11 AM |
Wall Street Journal recipe broken? | nisew | Recipes | 2 | 09-28-2011 05:08 PM |
Wall Street Journal recipe - How good is it? | SpiderMatt | Recipes | 3 | 08-28-2011 10:24 PM |
Wall Street Journal, WSJ, Free version, recipe improvement for full text of all ar | winterescape | Recipes | 16 | 02-07-2011 01:51 PM |