Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 09-20-2015, 10:10 AM   #1
mikebw
Member
mikebw began at the beginning.
 
Posts: 22
Karma: 10
Join Date: Nov 2014
Device: none
"The Atlantic" recipe broken by web site change

I don't have an archive of the source of the web page used to begin scraping

Code:
INDEX = 'http://www.theatlantic.com/magazine/toc/0/'
but I infer from the recipe source that the page was previously split into sections so that each section was marked by an 'h2' tag used to build a list. The sections seem to be gone and as a result the recipe fails out because no articles are found to retrieve. I proved this with an ugly patch that simply initializes the section label to a default:

Broken original
Code:
        current_section, current_articles = None, []
Ugly patch
Code:
        current_section, current_articles = 'Main', []
While this does successfully retrieve all of the articles listed on the 'magazine" TOC page, that is far fewer than the old script could retrieve. Worse, the target of the "magazine" link looks fairly static, although the articles are always timestamped currently when they are retrieved because the pages are generated dynamically.

The long-term solution is probably to build a section list from the navigation bar and recurse through it, but that is substantially more programming than a simple patch. Blindly following this approach would result in a mix of articles ranging between current and months old, and that would have to be intelligently handled as well.

Code:
    <div id="nav-channel-bar">
        <ul id="nav-channels" data-omni-click="r'nav',@href,l.pathname">

                        <li class="nav-channel politics">
                <a class="channel-link" href="/politics/" data-omni-click="inherit">Politics</a>
                                <ul class="channel-dropdown" data-omni-click="r'sub-nav`politics',$li,@href,l.pathname">
                    <li class="dropdown-label">Top Stories</li>
                        <li class="dropdown-item">
        <a data-omni-click="inherit" href="http://www.theatlantic.com/politics/archive/2015/04/the-death-penalty-becomes-unusual/390867/">The Death Penalty Becomes Rare</a>
    </li>
    <li class="dropdown-item">
        <a data-omni-click="inherit" href="http://www.theatlantic.com/politics/archive/2015/04/the-liberty-to-feed-the-poor/390987/">The Liberty to Feed the Poor</a>
    </li>
    <li class="dropdown-item">
        <a data-omni-click="inherit" href="http://www.theatlantic.com/politics/archive/2015/04/senate-confirms-loretta-lynch/391056/">Loretta Lynch, America's Next Attorney General</a>
    </li>

    <li class="dropdown-item">
        <a data-omni-click="inherit" href="http://www.theatlantic.com/politics/archive/2015/09/do-neanderthals-have-souls/406246/">Did Neanderthals Have Souls?</a>
    </li>
    <li class="dropdown-item">
        <a data-omni-click="inherit" href="http://www.theatlantic.com/politics/archive/2015/09/fear-and-clothing/405919/">Dressing for Success in Washington, D.C.</a>
    </li>
    <li class="dropdown-item">
        <a data-omni-click="inherit" href="http://www.theatlantic.com/politics/archive/2015/09/how-the-constitution-was-indeed-pro-slavery/406288/">How the Constitution Was Indeed Pro-Slavery</a>
    </li>



                </ul>
                            </li>
I may attempt this if I find some spare time over the next month or so, but if anyone else is more motivated to dive into this than I am, feel free -- but please comment on this thread so people are not duplicating efforts.
mikebw is offline   Reply With Quote
Old 09-20-2015, 11:28 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
https://github.com/kovidgoyal/calibr...7280ac05b8aeb2
kovidgoyal is offline   Reply With Quote
Advert
Old 05-03-2016, 08:47 AM   #3
mendesitba
Connoisseur
mendesitba began at the beginning.
 
Posts: 65
Karma: 10
Join Date: Mar 2015
Device: KPW, Ipad 2, Note 5
Hello!
Is anyone using the Atlantic recipe? It seems to be broken again. Here, when I download it, only 'table of contents' shows up, but without any article/content.
mendesitba is offline   Reply With Quote
Old 05-04-2016, 06:25 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
https://github.com/kovidgoyal/calibr...e8ba1146ae6118
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
"Web Browser is unable to establish a secure connection to this web site" Glorfindel Kindle Developer's Corner 62 01-19-2024 12:01 PM
Providence Journal recipe broken by web site changes mikebw Recipes 3 04-05-2015 11:30 PM
Legal web site: "Please stop calling Amazon a monopoly" fjtorres General Discussions 44 10-19-2014 05:25 AM
Is there any site that can shrink/split web pages for kindle's "basic web"? thanks kocoman Amazon Kindle 1 03-22-2013 06:01 PM
Trying to make a modified version of the recipe for "The Atlantic" camiller Recipes 3 02-14-2012 03:59 PM


All times are GMT -4. The time now is 12:55 PM.


MobileRead.com is a privately owned, operated and funded community.