View Single Post
Old 07-16-2014, 01:58 PM   #1
hashken
Member
hashken began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Mar 2014
Device: Kindle Paperwhite 1st Gen
Content missing in the final step of book creation

I'm trying to enhance the inbuilt Economic Times of India recipe but running into certain problems.

The recipe pulls in mobile print version of the articles using the RSS feeds. In these articles the main content is located in a <div class="storycontent"> tag.

The heading, article summary etc. are there properly in the final ebook. But somehow the main content portion alone in the above mentoned tag is missing in the final ebook.

I checked the ./debug/processed/feed_0/article_0/index.html file and the above tag alongwith the content was present. So, this means there is something wrong with the calibre converter.

A link to a sample article - http://m.economictimes.com/PDAET/art...w/38499011.cms

My recipe code
Code:
__license__   = 'GPL v3'
__copyright__ = '2008-2010, Darko Miletic <darko.miletic at gmail.com>'
'''
economictimes.indiatimes.com
'''


from calibre.web.feeds.news import BasicNewsRecipe

class TheEconomicTimes(BasicNewsRecipe):
    title                  = 'The Economic Times India'
    __author__             = 'Darko Miletic'
    description            = 'Financial news from India'
    publisher              = 'economictimes.indiatimes.com'
    category               = 'news, finances, politics, India'
    oldest_article         = 2
    max_articles_per_feed  = 100
    no_stylesheets         = True
    use_embedded_content   = False
    simultaneous_downloads = 1
    encoding               = 'utf-8'
    language               = 'en_IN'
    publication_type       = 'newspaper'
    masthead_url           = 'http://economictimes.indiatimes.com/photo/2676871.cms'
    extra_css              = """
                                 body{font-family: Arial,Helvetica,sans-serif}
                             """
    conversion_options     = {'comment'          : description, 
                              'tags'             : category,
                              'publisher'        : publisher,
                              'language'         : language
                             }
    #remove_tags_before     = dict(name='h1')
    #remove_tags_after      = dict(name='div', attrs={'class':'spacebw'})
    feeds                  = [(u'All articles', u'http://economictimes.indiatimes.com/rssfeedsdefault.cms')]


    #Uses the mobile print version. For web print version use 'http://economictimes.indiatimes.com/articleshow/<article_id>?prtpage=1'
    def print_version(self, url):
        rest, sep, article_id = url.rpartition('/articleshow/')
        return 'http://m.economictimes.com/PDAET/articleshow/' + article_id

    def get_article_url(self, article):
        rurl = article.get('guid',  None)
        if (rurl.find('/quickieslist/') > 0) or (rurl.find('/quickiearticleshow/') > 0):
            return None
        return rurl

    def preprocess_html(self, soup):
        #for item in soup.findAll(style=True):
            #del item['style']
        return soup

    def postprocess_html(self, soup, first_fetch):
        return self.adeify_images(soup)

Content of ./debug/processed/feed_0/article_0/index.html
Code:
<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Last-Modified" content="16 Jul, 2237hrs IST"/>
    <title>First rate hike 'likely' early 2015, says Dallas Fed President Richard Fisher - The Economic Times on Mobile</title>
    <meta name="description" content="The Federal Reserve's policy-setting panel is 'likely' to start raising rates in early 2015, if not sooner, a top Fed official said on Wednesday."/>
    <meta name="keywords" content="US Federal reserve,US central bank,University of Southern California,Richard Fisher,Rate hike,President"/>
    <link xmlns="" rel="shortcut icon" href="http://m.economictimes.com/icons/etfavicon.ico"/>
    <meta name="viewport" content="width=device-width; initial-scale=1.0; maximum-scale=1.0; user-scalable=0;"/>
    <meta name="apple-mobile-web-app-capable" content="yes"/>
    <meta name="HandheldFriendly" content="true"/>
    <meta name="MobileOptimized" content="width"/>
    <config xmlns="http://www.w3.org/1999/xhtml" key="2147477890"/>
    <config/>
    <config xmlns="http://www.w3.org/1999/xhtml" datetimeformat="yyyy"/>
    <config datetimeformat="yyyy">
<link rel="canonical" href="http://economictimes.indiatimes.com/news/international/business/first-rate-hike-likely-early-2015-says-dallas-fed-president-richard-fisher/articleshow/38499011.cms"/>
</config>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
  <link href="../../stylesheet.css" rel="stylesheet" type="text/css"/>
<link href="../../page_styles.css" rel="stylesheet" type="text/css"/>
</head>
  <body class="calibre"><div class="calibrenavbar">| <a href="../article_1/index.html">Next</a> | <a href="../index.html#article_0">Section Menu</a> | <a href="../../index.html#feed_0">Main Menu</a> | <hr class="calibre6"/>
</div><div class="calibre5"><a href="/rssfeeds/26519199.cms"><div class="calibre5"><img alt="ET MOBILE RSS" class="calibre2" src="images/img1.jpg"/><br class="calibre5"/></div></a><span>16 Jul, 2237hrs IST</span><a href="http://economictimes.indiatimes.com/">Full Site</a></div><div class="calibre5"><a href="/"><div class="calibre5"><img alt="ET MOBILE" src="images/img2.png" class="calibre2"/><br class="calibre5"/></div></a></div><div class="calibre5"><table width="98%" border="0" cellspacing="0" cellpadding="0" class="calibre7"><tr class="calibre8"><td class="bold" width="10%" valign="top">Sensex</td><td width="30%" class="bold">25549.72</td><td width="30%" class="bold"><span>**321.07**<div class="calibre5"><img alt="Sensex Decrease" title="Sensex Decrease" src="images/img3.png" class="calibre2"/><br class="calibre5"/></div></span></td><td width="30%" class="bold"><span>**1.27%
										**<div class="calibre5"><img alt="Sensex Decrease" title="Sensex Decrease" src="images/img3.png" class="calibre2"/><br class="calibre5"/></div></span></td></tr><tr class="calibre8"><td class="bold" width="10%" valign="top">Nifty</td><td width="30%" class="bold">7624.40</td><td width="30%" class="bold"><span>**97.75**<div class="calibre5"><img alt="Sensex Decrease" title="Sensex Decrease" src="images/img3.png" class="calibre2"/><br class="calibre5"/></div></span></td><td width="30%" class="bold"><span>**1.30%
										**<div class="calibre5"><img alt="Sensex Decrease" title="Sensex Decrease" src="images/img3.png" class="calibre2"/><br class="calibre5"/></div></span></td></tr></table><form action="/stockquotes.cms" method="get" name="stockfrm" class="calibre5"><div class="calibre5"><input onclick="quote_blank();" value="Get Quote" size="20" name="ticker" type="text"/>**<input name="B1" value="Go" type="submit"/><a title="Mobile Apps" href="/mobileapps.cms"><div class="calibre5"><img alt="Mobile Apps" src="images/img4.png" class="calibre2"/><br class="calibre5"/></div></a></div></form></div><hr class="calibre6"/><a href="/">Home</a> | <a href="/budget2014.cms">Budget 2014</a> | <a href="/market/1977021501.cms?exchange=n&amp;exchangeid=50">Markets</a> | <a href="/industry/13352306.cms">Industry</a> | <a href="/articlelist/32897620.cms">ET Panache</a> | <a href="/summary.cms?idx=1">Portfolio</a> | <a href="/allsections.cms">All Sections</a> | <a href="http://epaper.timesofindia.com/index.asp">mPaper</a><hr xmlns="http://www.w3.org/1999/xhtml" class="calibre6"/><div xmlns="" style="width:100%;text-align:center;"></div><div xmlns="http://www.w3.org/1999/xhtml" class="calibre5"><div xmlns="http://www.w3.org/1999/xhtml" class="calibre5"><div class="calibre5"><img alt="" hspace="5" src="images/img6.png" class="calibre2"/><br class="calibre5"/></div><a href="/mail/38499011.cms">E-mail this</a></div><h2 xmlns="http://www.w3.org/1999/xhtml" class="calibre9">BUSINESS</h2></div><div class="calibre5"><config showseo="1" showslide="1" showrelatedarticle="1" datetimeformat="d mmm, yyyy, hhnn  'hrs IST'"><h1 class="calibre10">First rate hike 'likely' early 2015, says Dallas Fed President Richard Fisher</h1><div class="calibre5"><artdate>16 Jul, 2014, 2232  hrs IST</artdate>,*<artag>Reuters</artag></div><div class="calibre5"><div class="calibre5"><a href="/PDAET/quickiearticleshow/38499028.cms"><div class="calibre5"><img alt="" class="calibre2" src="images/img7.jpg"/><br class="calibre5"/></div></a></div><div class="calibre5">The Federal Reserve's policy-setting panel is 'likely' to start raising rates in early 2015, if not sooner, a top Fed official said on Wednesday.</div></div><div xmlns="" class="storycontent"> LOS ANGELES: The Federal Reserve's policy-setting panel is 'likely' to start raising rates in early 2015, if not sooner, a top Fed official said on Wednesday. <br/><br/> The prediction from Dallas Fed President Richard Fisher went beyond his prepared remarks to the University of Southern California, in which he said the Fed "may well" raise rates in early 2015. Futures traders currently expect a first rate rise in mid-2015. <br/><br/> The rate rises will likely come in "gradual increments," he said. <br/><br/> Fisher is a voting member of the US central bank's policy-setting committee this year. <meta content="cms.next" name="cmsei"/></div></config></div><br class="calibre5"/><div xmlns="" class="spacebw"><div id="ad36070" name="ad36070" align="center"></div></div><br xmlns="http://www.w3.org/1999/xhtml" class="calibre5"/><div id="mob_add" class="calibre5"></div><hr xmlns=""/><a href="/">Home</a> | <a href="/budget2014.cms">Budget 2014</a> | <a href="/market/1977021501.cms?exchange=n&amp;exchangeid=50">Markets</a> | <a href="/industry/13352306.cms">Industry</a> | <a href="/articlelist/32897620.cms">ET Panache</a> | <a href="/summary.cms?idx=1">Portfolio</a> | <a href="/allsections.cms">All Sections</a> | <a href="http://epaper.timesofindia.com/index.asp">mPaper</a><br class="calibre5"/>To Download ET Apps, pls <a href="http://m.economictimes.com/mobileapps.cms">click here<div class="calibre5"><img alt="ET MOBILE" src="images/img9.png" class="calibre2"/><br class="calibre5"/></div></a><hr class="calibre6"/>Other Mobile Sites: <a href="http://m.timesofindia.com/">TOI MOBILE</a>, <a href="http://m.indiatimes.com">Indiatimes</a>,
		<a title="Follo" href="http://m.follo.co.in">follo</a>,
		<a title="GreetZap" href="http://m.greetzap.in">GreetZap</a>,
		<a title="Alive" href="http://aliveapp.in">Alive</a><br class="calibre5"/><a title="TimesJobs Mobile" href="http://m.timesjobs.com?src=etm">Job Search</a> | <a title="MagicBricks Mobile" href="http://m.magicbricks.com?source=etm">Property Search</a> | <a title="Ads2Book Mobile" href="http://m.ads2book.com?src=etm">Post Print Ad</a><hr class="calibre6"/><div class="calibre5">Copyright  ©*2014*Bennett Coleman &amp; Co. All rights reserved.<br class="calibre5"/>Powered by Indiatimes. <a href="http://m.economictimes.com/termsofuse.cms" class="calibre11">Terms of Use and Grievance Redressal Policy</a><span class="calibre12"> |</span><a href="/privacypolicy.cms" class="calibre13">Privacy Policy</a></div><config xmlns="http://www.w3.org/1999/xhtml" gaaccountid="MO-12812017-2"><div class="calibre5"><img src="images/img10.png" class="calibre2"/><br class="calibre5"/></div><p class="hidden"><div class="calibre5"><img id="hiddenImg" alt="*" class="calibre2"/><br class="calibre5"/></div></p></config><div class="calibrenavbar">
<hr class="calibre6"/>
<p class="calibre14">This article was downloaded by <strong class="calibre15">calibre</strong> from <a href="http://economictimes.indiatimes.com/news/international/business/first-rate-hike-likely-early-2015-says-dallas-fed-president-richard-fisher/articleshow/38499011.cms">http://economictimes.indiatimes.com/news/international/business/first-rate-hike-likely-early-2015-says-dallas-fed-president-richard-fisher/articleshow/38499011.cms</a></p>
<br class="calibre5"/><br class="calibre5"/> | <a href="../index.html#article_0">Section Menu</a> | <a href="../../index.html#feed_0">Main Menu</a> | </div></body></html>

Last edited by hashken; 07-16-2014 at 02:03 PM.
hashken is offline   Reply With Quote