I don't have an archive of the source of the web page used to begin scraping
Code:
INDEX = 'http://www.theatlantic.com/magazine/toc/0/'
but I infer from the recipe source that the page was previously split into sections so that each section was marked by an 'h2' tag used to build a list. The sections seem to be gone and as a result the recipe fails out because no articles are found to retrieve. I proved this with an ugly patch that simply initializes the section label to a default:
Broken original
Code:
current_section, current_articles = None, []
Ugly patch
Code:
current_section, current_articles = 'Main', []
While this does successfully retrieve all of the articles listed on the 'magazine" TOC page, that is far fewer than the old script could retrieve. Worse, the target of the "magazine" link looks fairly static, although the articles are always timestamped currently when they are retrieved because the pages are generated dynamically.
The long-term solution is probably to build a section list from the navigation bar and recurse through it, but that is substantially more programming than a simple patch. Blindly following this approach would result in a mix of articles ranging between current and months old, and that would have to be intelligently handled as well.
Code:
<div id="nav-channel-bar">
<ul id="nav-channels" data-omni-click="r'nav',@href,l.pathname">
<li class="nav-channel politics">
<a class="channel-link" href="/politics/" data-omni-click="inherit">Politics</a>
<ul class="channel-dropdown" data-omni-click="r'sub-nav`politics',$li,@href,l.pathname">
<li class="dropdown-label">Top Stories</li>
<li class="dropdown-item">
<a data-omni-click="inherit" href="http://www.theatlantic.com/politics/archive/2015/04/the-death-penalty-becomes-unusual/390867/">The Death Penalty Becomes Rare</a>
</li>
<li class="dropdown-item">
<a data-omni-click="inherit" href="http://www.theatlantic.com/politics/archive/2015/04/the-liberty-to-feed-the-poor/390987/">The Liberty to Feed the Poor</a>
</li>
<li class="dropdown-item">
<a data-omni-click="inherit" href="http://www.theatlantic.com/politics/archive/2015/04/senate-confirms-loretta-lynch/391056/">Loretta Lynch, America's Next Attorney General</a>
</li>
<li class="dropdown-item">
<a data-omni-click="inherit" href="http://www.theatlantic.com/politics/archive/2015/09/do-neanderthals-have-souls/406246/">Did Neanderthals Have Souls?</a>
</li>
<li class="dropdown-item">
<a data-omni-click="inherit" href="http://www.theatlantic.com/politics/archive/2015/09/fear-and-clothing/405919/">Dressing for Success in Washington, D.C.</a>
</li>
<li class="dropdown-item">
<a data-omni-click="inherit" href="http://www.theatlantic.com/politics/archive/2015/09/how-the-constitution-was-indeed-pro-slavery/406288/">How the Constitution Was Indeed Pro-Slavery</a>
</li>
</ul>
</li>
I may attempt this if I find some spare time over the next month or so, but if anyone else is more motivated to dive into this than I am, feel free -- but please comment on this thread so people are not duplicating efforts.