09-20-2015, 10:10 AM | #1 |
Member
Posts: 22
Karma: 10
Join Date: Nov 2014
Device: none
|
"The Atlantic" recipe broken by web site change
I don't have an archive of the source of the web page used to begin scraping
Code:
INDEX = 'http://www.theatlantic.com/magazine/toc/0/' Broken original Code:
current_section, current_articles = None, [] Code:
current_section, current_articles = 'Main', [] The long-term solution is probably to build a section list from the navigation bar and recurse through it, but that is substantially more programming than a simple patch. Blindly following this approach would result in a mix of articles ranging between current and months old, and that would have to be intelligently handled as well. Code:
<div id="nav-channel-bar"> <ul id="nav-channels" data-omni-click="r'nav',@href,l.pathname"> <li class="nav-channel politics"> <a class="channel-link" href="/politics/" data-omni-click="inherit">Politics</a> <ul class="channel-dropdown" data-omni-click="r'sub-nav`politics',$li,@href,l.pathname"> <li class="dropdown-label">Top Stories</li> <li class="dropdown-item"> <a data-omni-click="inherit" href="http://www.theatlantic.com/politics/archive/2015/04/the-death-penalty-becomes-unusual/390867/">The Death Penalty Becomes Rare</a> </li> <li class="dropdown-item"> <a data-omni-click="inherit" href="http://www.theatlantic.com/politics/archive/2015/04/the-liberty-to-feed-the-poor/390987/">The Liberty to Feed the Poor</a> </li> <li class="dropdown-item"> <a data-omni-click="inherit" href="http://www.theatlantic.com/politics/archive/2015/04/senate-confirms-loretta-lynch/391056/">Loretta Lynch, America's Next Attorney General</a> </li> <li class="dropdown-item"> <a data-omni-click="inherit" href="http://www.theatlantic.com/politics/archive/2015/09/do-neanderthals-have-souls/406246/">Did Neanderthals Have Souls?</a> </li> <li class="dropdown-item"> <a data-omni-click="inherit" href="http://www.theatlantic.com/politics/archive/2015/09/fear-and-clothing/405919/">Dressing for Success in Washington, D.C.</a> </li> <li class="dropdown-item"> <a data-omni-click="inherit" href="http://www.theatlantic.com/politics/archive/2015/09/how-the-constitution-was-indeed-pro-slavery/406288/">How the Constitution Was Indeed Pro-Slavery</a> </li> </ul> </li> |
09-20-2015, 11:28 AM | #2 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
Advert | |
|
05-03-2016, 08:47 AM | #3 |
Connoisseur
Posts: 65
Karma: 10
Join Date: Mar 2015
Device: KPW, Ipad 2, Note 5
|
Hello!
Is anyone using the Atlantic recipe? It seems to be broken again. Here, when I download it, only 'table of contents' shows up, but without any article/content. |
05-04-2016, 06:25 AM | #4 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
"Web Browser is unable to establish a secure connection to this web site" | Glorfindel | Kindle Developer's Corner | 62 | 01-19-2024 12:01 PM |
Providence Journal recipe broken by web site changes | mikebw | Recipes | 3 | 04-05-2015 11:30 PM |
Legal web site: "Please stop calling Amazon a monopoly" | fjtorres | General Discussions | 44 | 10-19-2014 05:25 AM |
Is there any site that can shrink/split web pages for kindle's "basic web"? thanks | kocoman | Amazon Kindle | 1 | 03-22-2013 06:01 PM |
Trying to make a modified version of the recipe for "The Atlantic" | camiller | Recipes | 3 | 02-14-2012 03:59 PM |