Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 07-16-2014, 01:58 PM   #1
hashken
Member
hashken began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Mar 2014
Device: Kindle Paperwhite 1st Gen
Content missing in the final step of book creation

I'm trying to enhance the inbuilt Economic Times of India recipe but running into certain problems.

The recipe pulls in mobile print version of the articles using the RSS feeds. In these articles the main content is located in a <div class="storycontent"> tag.

The heading, article summary etc. are there properly in the final ebook. But somehow the main content portion alone in the above mentoned tag is missing in the final ebook.

I checked the ./debug/processed/feed_0/article_0/index.html file and the above tag alongwith the content was present. So, this means there is something wrong with the calibre converter.

A link to a sample article - http://m.economictimes.com/PDAET/art...w/38499011.cms

My recipe code
Code:
__license__   = 'GPL v3'
__copyright__ = '2008-2010, Darko Miletic <darko.miletic at gmail.com>'
'''
economictimes.indiatimes.com
'''


from calibre.web.feeds.news import BasicNewsRecipe

class TheEconomicTimes(BasicNewsRecipe):
    title                  = 'The Economic Times India'
    __author__             = 'Darko Miletic'
    description            = 'Financial news from India'
    publisher              = 'economictimes.indiatimes.com'
    category               = 'news, finances, politics, India'
    oldest_article         = 2
    max_articles_per_feed  = 100
    no_stylesheets         = True
    use_embedded_content   = False
    simultaneous_downloads = 1
    encoding               = 'utf-8'
    language               = 'en_IN'
    publication_type       = 'newspaper'
    masthead_url           = 'http://economictimes.indiatimes.com/photo/2676871.cms'
    extra_css              = """
                                 body{font-family: Arial,Helvetica,sans-serif}
                             """
    conversion_options     = {'comment'          : description, 
                              'tags'             : category,
                              'publisher'        : publisher,
                              'language'         : language
                             }
    #remove_tags_before     = dict(name='h1')
    #remove_tags_after      = dict(name='div', attrs={'class':'spacebw'})
    feeds                  = [(u'All articles', u'http://economictimes.indiatimes.com/rssfeedsdefault.cms')]


    #Uses the mobile print version. For web print version use 'http://economictimes.indiatimes.com/articleshow/<article_id>?prtpage=1'
    def print_version(self, url):
        rest, sep, article_id = url.rpartition('/articleshow/')
        return 'http://m.economictimes.com/PDAET/articleshow/' + article_id

    def get_article_url(self, article):
        rurl = article.get('guid',  None)
        if (rurl.find('/quickieslist/') > 0) or (rurl.find('/quickiearticleshow/') > 0):
            return None
        return rurl

    def preprocess_html(self, soup):
        #for item in soup.findAll(style=True):
            #del item['style']
        return soup

    def postprocess_html(self, soup, first_fetch):
        return self.adeify_images(soup)

Content of ./debug/processed/feed_0/article_0/index.html
Code:
<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Last-Modified" content="16 Jul, 2237hrs IST"/>
    <title>First rate hike 'likely' early 2015, says Dallas Fed President Richard Fisher - The Economic Times on Mobile</title>
    <meta name="description" content="The Federal Reserve's policy-setting panel is 'likely' to start raising rates in early 2015, if not sooner, a top Fed official said on Wednesday."/>
    <meta name="keywords" content="US Federal reserve,US central bank,University of Southern California,Richard Fisher,Rate hike,President"/>
    <link xmlns="" rel="shortcut icon" href="http://m.economictimes.com/icons/etfavicon.ico"/>
    <meta name="viewport" content="width=device-width; initial-scale=1.0; maximum-scale=1.0; user-scalable=0;"/>
    <meta name="apple-mobile-web-app-capable" content="yes"/>
    <meta name="HandheldFriendly" content="true"/>
    <meta name="MobileOptimized" content="width"/>
    <config xmlns="http://www.w3.org/1999/xhtml" key="2147477890"/>
    <config/>
    <config xmlns="http://www.w3.org/1999/xhtml" datetimeformat="yyyy"/>
    <config datetimeformat="yyyy">
<link rel="canonical" href="http://economictimes.indiatimes.com/news/international/business/first-rate-hike-likely-early-2015-says-dallas-fed-president-richard-fisher/articleshow/38499011.cms"/>
</config>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
  <link href="../../stylesheet.css" rel="stylesheet" type="text/css"/>
<link href="../../page_styles.css" rel="stylesheet" type="text/css"/>
</head>
  <body class="calibre"><div class="calibrenavbar">| <a href="../article_1/index.html">Next</a> | <a href="../index.html#article_0">Section Menu</a> | <a href="../../index.html#feed_0">Main Menu</a> | <hr class="calibre6"/>
</div><div class="calibre5"><a href="/rssfeeds/26519199.cms"><div class="calibre5"><img alt="ET MOBILE RSS" class="calibre2" src="images/img1.jpg"/><br class="calibre5"/></div></a><span>16 Jul, 2237hrs IST</span><a href="http://economictimes.indiatimes.com/">Full Site</a></div><div class="calibre5"><a href="/"><div class="calibre5"><img alt="ET MOBILE" src="images/img2.png" class="calibre2"/><br class="calibre5"/></div></a></div><div class="calibre5"><table width="98%" border="0" cellspacing="0" cellpadding="0" class="calibre7"><tr class="calibre8"><td class="bold" width="10%" valign="top">Sensex</td><td width="30%" class="bold">25549.72</td><td width="30%" class="bold"><span>**321.07**<div class="calibre5"><img alt="Sensex Decrease" title="Sensex Decrease" src="images/img3.png" class="calibre2"/><br class="calibre5"/></div></span></td><td width="30%" class="bold"><span>**1.27%
										**<div class="calibre5"><img alt="Sensex Decrease" title="Sensex Decrease" src="images/img3.png" class="calibre2"/><br class="calibre5"/></div></span></td></tr><tr class="calibre8"><td class="bold" width="10%" valign="top">Nifty</td><td width="30%" class="bold">7624.40</td><td width="30%" class="bold"><span>**97.75**<div class="calibre5"><img alt="Sensex Decrease" title="Sensex Decrease" src="images/img3.png" class="calibre2"/><br class="calibre5"/></div></span></td><td width="30%" class="bold"><span>**1.30%
										**<div class="calibre5"><img alt="Sensex Decrease" title="Sensex Decrease" src="images/img3.png" class="calibre2"/><br class="calibre5"/></div></span></td></tr></table><form action="/stockquotes.cms" method="get" name="stockfrm" class="calibre5"><div class="calibre5"><input onclick="quote_blank();" value="Get Quote" size="20" name="ticker" type="text"/>**<input name="B1" value="Go" type="submit"/><a title="Mobile Apps" href="/mobileapps.cms"><div class="calibre5"><img alt="Mobile Apps" src="images/img4.png" class="calibre2"/><br class="calibre5"/></div></a></div></form></div><hr class="calibre6"/><a href="/">Home</a> | <a href="/budget2014.cms">Budget 2014</a> | <a href="/market/1977021501.cms?exchange=n&amp;exchangeid=50">Markets</a> | <a href="/industry/13352306.cms">Industry</a> | <a href="/articlelist/32897620.cms">ET Panache</a> | <a href="/summary.cms?idx=1">Portfolio</a> | <a href="/allsections.cms">All Sections</a> | <a href="http://epaper.timesofindia.com/index.asp">mPaper</a><hr xmlns="http://www.w3.org/1999/xhtml" class="calibre6"/><div xmlns="" style="width:100%;text-align:center;"></div><div xmlns="http://www.w3.org/1999/xhtml" class="calibre5"><div xmlns="http://www.w3.org/1999/xhtml" class="calibre5"><div class="calibre5"><img alt="" hspace="5" src="images/img6.png" class="calibre2"/><br class="calibre5"/></div><a href="/mail/38499011.cms">E-mail this</a></div><h2 xmlns="http://www.w3.org/1999/xhtml" class="calibre9">BUSINESS</h2></div><div class="calibre5"><config showseo="1" showslide="1" showrelatedarticle="1" datetimeformat="d mmm, yyyy, hhnn  'hrs IST'"><h1 class="calibre10">First rate hike 'likely' early 2015, says Dallas Fed President Richard Fisher</h1><div class="calibre5"><artdate>16 Jul, 2014, 2232  hrs IST</artdate>,*<artag>Reuters</artag></div><div class="calibre5"><div class="calibre5"><a href="/PDAET/quickiearticleshow/38499028.cms"><div class="calibre5"><img alt="" class="calibre2" src="images/img7.jpg"/><br class="calibre5"/></div></a></div><div class="calibre5">The Federal Reserve's policy-setting panel is 'likely' to start raising rates in early 2015, if not sooner, a top Fed official said on Wednesday.</div></div><div xmlns="" class="storycontent"> LOS ANGELES: The Federal Reserve's policy-setting panel is 'likely' to start raising rates in early 2015, if not sooner, a top Fed official said on Wednesday. <br/><br/> The prediction from Dallas Fed President Richard Fisher went beyond his prepared remarks to the University of Southern California, in which he said the Fed "may well" raise rates in early 2015. Futures traders currently expect a first rate rise in mid-2015. <br/><br/> The rate rises will likely come in "gradual increments," he said. <br/><br/> Fisher is a voting member of the US central bank's policy-setting committee this year. <meta content="cms.next" name="cmsei"/></div></config></div><br class="calibre5"/><div xmlns="" class="spacebw"><div id="ad36070" name="ad36070" align="center"></div></div><br xmlns="http://www.w3.org/1999/xhtml" class="calibre5"/><div id="mob_add" class="calibre5"></div><hr xmlns=""/><a href="/">Home</a> | <a href="/budget2014.cms">Budget 2014</a> | <a href="/market/1977021501.cms?exchange=n&amp;exchangeid=50">Markets</a> | <a href="/industry/13352306.cms">Industry</a> | <a href="/articlelist/32897620.cms">ET Panache</a> | <a href="/summary.cms?idx=1">Portfolio</a> | <a href="/allsections.cms">All Sections</a> | <a href="http://epaper.timesofindia.com/index.asp">mPaper</a><br class="calibre5"/>To Download ET Apps, pls <a href="http://m.economictimes.com/mobileapps.cms">click here<div class="calibre5"><img alt="ET MOBILE" src="images/img9.png" class="calibre2"/><br class="calibre5"/></div></a><hr class="calibre6"/>Other Mobile Sites: <a href="http://m.timesofindia.com/">TOI MOBILE</a>, <a href="http://m.indiatimes.com">Indiatimes</a>,
		<a title="Follo" href="http://m.follo.co.in">follo</a>,
		<a title="GreetZap" href="http://m.greetzap.in">GreetZap</a>,
		<a title="Alive" href="http://aliveapp.in">Alive</a><br class="calibre5"/><a title="TimesJobs Mobile" href="http://m.timesjobs.com?src=etm">Job Search</a> | <a title="MagicBricks Mobile" href="http://m.magicbricks.com?source=etm">Property Search</a> | <a title="Ads2Book Mobile" href="http://m.ads2book.com?src=etm">Post Print Ad</a><hr class="calibre6"/><div class="calibre5">Copyright  ©*2014*Bennett Coleman &amp; Co. All rights reserved.<br class="calibre5"/>Powered by Indiatimes. <a href="http://m.economictimes.com/termsofuse.cms" class="calibre11">Terms of Use and Grievance Redressal Policy</a><span class="calibre12"> |</span><a href="/privacypolicy.cms" class="calibre13">Privacy Policy</a></div><config xmlns="http://www.w3.org/1999/xhtml" gaaccountid="MO-12812017-2"><div class="calibre5"><img src="images/img10.png" class="calibre2"/><br class="calibre5"/></div><p class="hidden"><div class="calibre5"><img id="hiddenImg" alt="*" class="calibre2"/><br class="calibre5"/></div></p></config><div class="calibrenavbar">
<hr class="calibre6"/>
<p class="calibre14">This article was downloaded by <strong class="calibre15">calibre</strong> from <a href="http://economictimes.indiatimes.com/news/international/business/first-rate-hike-likely-early-2015-says-dallas-fed-president-richard-fisher/articleshow/38499011.cms">http://economictimes.indiatimes.com/news/international/business/first-rate-hike-likely-early-2015-says-dallas-fed-president-richard-fisher/articleshow/38499011.cms</a></p>
<br class="calibre5"/><br class="calibre5"/> | <a href="../index.html#article_0">Section Menu</a> | <a href="../../index.html#feed_0">Main Menu</a> | </div></body></html>

Last edited by hashken; 07-16-2014 at 02:03 PM.
hashken is offline   Reply With Quote
Advert
Old 07-17-2014, 02:16 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 32,880
Karma: 10034422
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I ran you recipe and I dont see that, here is the processed html for one article

Spoiler:

Code:
<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Last-Modified" content="17 Jul, 1107hrs IST"/>
    <title>Economy report after one moth of Modi government: Growth looks up, inflation cools - The Economic Times on Mobile</title>
    <meta name="description" content="A series of good data numbers have come out in recent days that suggest the economy is picking up from decade-low growth rates in the past two years."/>
    <meta name="keywords" content="Wholesale price index,united states,Ukraine,State Bank Of India,settlement option,Rohini Malkani,productivity,net worth,Narendra Modi,Modi Government,markets,Insurability,Inflation,ICRA,HSBC,Gold,gdp,economy,Department of Commerce,current account,consumer price index,Citigroup,Bank of India"/>
    <link xmlns="" rel="shortcut icon" href="http://m.economictimes.com/icons/etfavicon.ico"/>
    <meta name="viewport" content="width=device-width; initial-scale=1.0; maximum-scale=1.0; user-scalable=0;"/>
    <meta name="apple-mobile-web-app-capable" content="yes"/>
    <meta name="HandheldFriendly" content="true"/>
    <meta name="MobileOptimized" content="width"/>
    <config xmlns="http://www.w3.org/1999/xhtml" key="2147477890"/>
    <config/>
    <config xmlns="http://www.w3.org/1999/xhtml" datetimeformat="yyyy"/>
    <config datetimeformat="yyyy">
<link rel="canonical" href="http://economictimes.indiatimes.com/news/economy/indicators/economy-report-after-one-moth-of-modi-government-growth-looks-up-inflation-cools/articleshow/38510439.cms"/>
</config>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
  <link href="../../stylesheet.css" rel="stylesheet" type="text/css"/>
<link href="../../page_styles.css" rel="stylesheet" type="text/css"/>
</head>
  <body class="calibre"><div class="calibrenavbar">| <a href="../article_1/index.html">Next</a> | <a href="../index.html#article_0">Section Menu</a> | <a href="../../index.html#feed_0">Main Menu</a> | <hr class="calibre6"/>
</div><div class="calibre5"><a href="/rssfeeds/344531568.cms"><div class="calibre5"><img alt="ET MOBILE RSS" class="calibre2" src="images/img1.jpg"/><br class="calibre5"/></div></a><span>17 Jul, 1107hrs IST</span><a href="http://economictimes.indiatimes.com/">Full Site</a></div><div class="calibre5"><a href="/"><div class="calibre5"><img alt="ET MOBILE" src="images/img2.png" class="calibre2"/><br class="calibre5"/></div></a></div><div class="calibre5"><table width="98%" border="0" cellspacing="0" cellpadding="0" class="calibre7"><tr class="calibre8"><td class="bold" width="10%" valign="top">Sensex</td><td width="30%" class="bold">25592.03</td><td width="30%" class="bold"><span>**42.31**<div class="calibre5"><img alt="Sensex Decrease" title="Sensex Decrease" src="images/img3.png" class="calibre2"/><br class="calibre5"/></div></span></td><td width="30%" class="bold"><span>**0.17%
										**<div class="calibre5"><img alt="Sensex Decrease" title="Sensex Decrease" src="images/img3.png" class="calibre2"/><br class="calibre5"/></div></span></td></tr><tr class="calibre8"><td class="bold" width="10%" valign="top">Nifty</td><td width="30%" class="bold">7638.30</td><td width="30%" class="bold"><span>**13.90**<div class="calibre5"><img alt="Sensex Decrease" title="Sensex Decrease" src="images/img3.png" class="calibre2"/><br class="calibre5"/></div></span></td><td width="30%" class="bold"><span>**0.18%
										**<div class="calibre5"><img alt="Sensex Decrease" title="Sensex Decrease" src="images/img3.png" class="calibre2"/><br class="calibre5"/></div></span></td></tr></table><form action="/stockquotes.cms" method="get" name="stockfrm" class="calibre5"><div class="calibre5"><input onclick="quote_blank();" value="Get Quote" size="20" name="ticker" type="text"/>**<input name="B1" value="Go" type="submit"/><a title="Mobile Apps" href="/mobileapps.cms"><div class="calibre5"><img alt="Mobile Apps" src="images/img4.png" class="calibre2"/><br class="calibre5"/></div></a></div></form></div><hr class="calibre6"/><a href="/">Home</a> | <a href="/budget2014.cms">Budget 2014</a> | <a href="/market/1977021501.cms?exchange=n&amp;exchangeid=50">Markets</a> | <a href="/industry/13352306.cms">Industry</a> | <a href="/articlelist/32897620.cms">ET Panache</a> | <a href="/summary.cms?idx=1">Portfolio</a> | <a href="/allsections.cms">All Sections</a> | <a href="http://epaper.timesofindia.com/index.asp">mPaper</a><hr xmlns="http://www.w3.org/1999/xhtml" class="calibre6"/><div xmlns="" style="width:100%;text-align:center;"></div><div xmlns="http://www.w3.org/1999/xhtml" class="calibre5"><div xmlns="http://www.w3.org/1999/xhtml" class="calibre5"><div class="calibre5"><img alt="" hspace="5" src="images/img6.png" class="calibre2"/><br class="calibre5"/></div><a href="/mail/38510439.cms">E-mail this</a></div><h2 xmlns="http://www.w3.org/1999/xhtml" class="calibre9">INDICATORS</h2></div><div class="calibre5"><config showseo="1" showslide="1" showrelatedarticle="1" datetimeformat="d mmm, yyyy, hhnn  'hrs IST'"><h1 class="calibre10">Economy report after one moth of Modi government: Growth looks up, inflation cools</h1><div class="calibre5"><artdate>17 Jul, 2014, 0722  hrs IST</artdate>,*<artag>ET Bureau</artag></div><div class="calibre5"><div class="calibre5"><a href="/PDAET/quickiearticleshow/38510726.cms"><div class="calibre5"><img alt="" class="calibre2" src="images/img7.jpg"/><br class="calibre5"/></div></a></div><div class="calibre5">The trade deficit was $11.78 billion in June, the highest in a year, but only marginally more than $11.28 billion in May.</div></div><div xmlns="" class="storycontent"><p> NEW DELHI: The first full month under the Narendra Modi government's watch turned out to be a good one for the economy with macro indicators looking up and inflation lower despite lingering monsoon doubts, suggesting that growth could have finally bottomed out.<br/> <br/> Exports rose 10.2% in June from a year ago, the government said on Wednesday, marking yet another positive development following a series of good numbers in recent days that suggest the economy is picking up from decade-low growth rates in the past two years.<br/> <br/> Industrial production rose to a 19-month high of 4.7% in May while car sales rose at their fastest pace in 10 months in June, clearly indicating that the consumer was more confident of the new government shaping recovery.<br/> <br/> Services activity rose to a 17-month high in June on the strength of robust order flow, according to the HSBC Purchasing Managers' Index, indicating rising optimism in the sector that has a share of more than 60% in the economy.<br/> <br/> Imports rose for the first time in a year, at around 8.3%, confirming some sort of recovery in the domestic economy even after discounting for higher gold imports, which rose nearly 65% in June after the Reserve Bank of India eased rules by allowing more entities to import gold.<br/> <br/> India's other big concern, retail inflation, dropped to 7.31% in June, the lowest since the government started reporting consumer price index inflation in January 2012, although the monsoon fears loom large.<br/> <br/> And to top it all, the trade deficit was $11.78 billion in June, the highest in a year, but only marginally more than $11.28 billion in May, according to data released on Wednesday by the commerce department.<br/> <div><img src="images/img8.jpg" class="gwt-Image"/><br/></div><br/> <br/> <br/> Markets cheered the development, with the Sensex rising 1.27% to 25,549.72 points. "The export data is very encouraging, especially the fact that it is led by robust performance of engineering goods, indicating a productivity revival. Given that non-oil, non-gold imports have shown an uptick, industrial production for June will also be quite strong," said Soumya Kanti Ghosh, chief economic advisor, State Bank of India.<br/> <br/> "One can say looking at car sales, manufacturing and exports data that the economy may well have finally bottomed out." That will bode well for the Modi government, which has pledged to turn the economy around while bringing prices under control. The economy could begin the first quarter of the current year at near-5% growth, up from 4.6% in the January-March quarter.<br/> <br/> The decline in global commodity prices will also act as a booster although Iraq and Ukraine are geopolitical sore spots with the potential to reverse the trend. Meanwhile, the June-September monsoon has been patchy although rains have picked up in the past two days.</p> </div><strong class="calibre11">Page 1 of 2 </strong><span></span><a href="/news/economy/indicators/economy-report-after-one-moth-of-modi-government-growth-looks-up-inflation-cools/articleshow/msid-38510439,curpg-2.cms">Next</a></config></div><br class="calibre5"/><div xmlns="" class="spacebw"><div id="ad36070" name="ad36070" align="center"></div></div><br xmlns="http://www.w3.org/1999/xhtml" class="calibre5"/><div id="mob_add" class="calibre5"></div><hr xmlns=""/><a href="/">Home</a> | <a href="/budget2014.cms">Budget 2014</a> | <a href="/market/1977021501.cms?exchange=n&amp;exchangeid=50">Markets</a> | <a href="/industry/13352306.cms">Industry</a> | <a href="/articlelist/32897620.cms">ET Panache</a> | <a href="/summary.cms?idx=1">Portfolio</a> | <a href="/allsections.cms">All Sections</a> | <a href="http://epaper.timesofindia.com/index.asp">mPaper</a><br class="calibre5"/>To Download ET Apps, pls <a href="http://m.economictimes.com/mobileapps.cms">click here<div class="calibre5"><img alt="ET MOBILE" src="images/img10.png" class="calibre2"/><br class="calibre5"/></div></a><hr class="calibre6"/>Other Mobile Sites: <a href="http://m.timesofindia.com/">TOI MOBILE</a>, <a href="http://m.indiatimes.com">Indiatimes</a>,
		<a title="Follo" href="http://m.follo.co.in">follo</a>,
		<a title="GreetZap" href="http://m.greetzap.in">GreetZap</a>,
		<a title="Alive" href="http://aliveapp.in">Alive</a><br class="calibre5"/><a title="TimesJobs Mobile" href="http://m.timesjobs.com?src=etm">Job Search</a> | <a title="MagicBricks Mobile" href="http://m.magicbricks.com?source=etm">Property Search</a> | <a title="Ads2Book Mobile" href="http://m.ads2book.com?src=etm">Post Print Ad</a><hr class="calibre6"/><div class="calibre5">Copyright  ©*2014*Bennett Coleman &amp; Co. All rights reserved.<br class="calibre5"/>Powered by Indiatimes. <a href="http://m.economictimes.com/termsofuse.cms" class="calibre12">Terms of Use and Grievance Redressal Policy</a><span class="calibre13"> |</span><a href="/privacypolicy.cms" class="calibre14">Privacy Policy</a></div><config xmlns="http://www.w3.org/1999/xhtml" gaaccountid="MO-12812017-2"><div class="calibre5"><img src="images/img11.png" class="calibre2"/><br class="calibre5"/></div><p class="hidden"><div class="calibre5"><img id="hiddenImg" alt="*" class="calibre2"/><br class="calibre5"/></div></p></config><div class="calibrenavbar">
<hr class="calibre6"/>
<p class="calibre15">This article was downloaded by <strong class="calibre11">calibre</strong> from <a href="http://economictimes.indiatimes.com/news/economy/indicators/economy-report-after-one-moth-of-modi-government-growth-looks-up-inflation-cools/articleshow/38510439.cms">http://economictimes.indiatimes.com/news/economy/indicators/economy-report-after-one-moth-of-modi-government-growth-looks-up-inflation-cools/articleshow/38510439.cms</a></p>
<br class="calibre5"/><br class="calibre5"/> | <a href="../index.html#article_0">Section Menu</a> | <a href="../../index.html#feed_0">Main Menu</a> | </div></body></html>
kovidgoyal is online now   Reply With Quote
Old 07-17-2014, 03:49 AM   #3
hashken
Member
hashken began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Mar 2014
Device: Kindle Paperwhite 1st Gen
Hi Kovid,

Your output too has the <div xmlns="" class="storycontent"> tag. It is present in the longest line in your output.

It is this tag and it's contents that form the main portion of the article and this is just not appearing in the final .mobi file
hashken is offline   Reply With Quote
Old 07-17-2014, 03:51 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 32,880
Karma: 10034422
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I'm confused are you saying the content is missing from the processed html or from the final book?
kovidgoyal is online now   Reply With Quote
Old 07-17-2014, 03:53 AM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 32,880
Karma: 10034422
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
In any case just add

remove_attributes = ['xmlns']

to the recipe to take care of it.
kovidgoyal is online now   Reply With Quote
Old 07-17-2014, 03:54 AM   #6
hashken
Member
hashken began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Mar 2014
Device: Kindle Paperwhite 1st Gen
The content is present in the processed HTML. As you can see in my original post, it is present in the index.html in the processed folder.

The content is only missing in the final book.
hashken is offline   Reply With Quote
Old 07-17-2014, 07:57 AM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 32,880
Karma: 10034422
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
See my previous post
kovidgoyal is online now   Reply With Quote
Old 07-17-2014, 08:33 AM   #8
hashken
Member
hashken began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Mar 2014
Device: Kindle Paperwhite 1st Gen
Surprisingly, removing "xmlns" attribute seemed to make everything work fine.

Is this an existing bug or is this supposed to be the expected behaviour and if so why?
hashken is offline   Reply With Quote
Old 07-17-2014, 08:47 AM   #9
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 32,880
Karma: 10034422
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
It is expected behavior. When a tag in an xhtml document declares its namespace to be something other than the XHTML namespace, which is what xmlns="" does, that tag is no longer part of the html document and the converter ignores it.
kovidgoyal is online now   Reply With Quote
Old 07-17-2014, 08:48 AM   #10
hashken
Member
hashken began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Mar 2014
Device: Kindle Paperwhite 1st Gen
Oh, didn't know that. Thanks for the prompt replies. Keep us the good work.
hashken is offline   Reply With Quote
Old 01-31-2016, 05:45 PM   #11
Sambit
Junior Member
Sambit began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jan 2016
Device: Kindle
Modified the code to fix Economic Times Downloaded content

Recently, Economic Times changed the guid tags to a text message that broke it again. Just fixed it or I feel it I did it. Moreover, pointing to the mobile site is not working very well so pointed back to the old print version URL. Code attached.

Code:
__license__   = 'GPL v3'
__copyright__ = '2008-2014, Karthik <hashkendistro@gmail.com>, Darko Miletic <darko.miletic at gmail.com>'
'''
economictimes.indiatimes.com
'''


from calibre.web.feeds.news import BasicNewsRecipe

class TheEconomicTimes(BasicNewsRecipe):
    title                  = 'The Economic Times India'
    __author__             = 'Karthik K, Darko Miletic'
    description            = 'Financial news from India'
    publisher              = 'economictimes.indiatimes.com'
    category               = 'news, finances, politics, India'
    oldest_article         = 1
    max_articles_per_feed  = 100
    no_stylesheets         = True
    #use_embedded_content   = False
    simultaneous_downloads = 1
    encoding               = 'utf-8'
    language               = 'en_IN'
    publication_type       = 'newspaper'
    masthead_url           = 'http://economictimes.indiatimes.com/photo/2676871.cms'
    extra_css              = """
                                 body{font-family: Arial,Helvetica,sans-serif}
                                 .foto_mg{font-size: 60%; 
                                          font-weight: 700;}
                                 h1{font-size: 150%;}
                                 artdate{font-size: 60%}
                                 artag{font-size: 60%}
                                 div.storycontent{padding-top: 10px}
                             """
    conversion_options     = {'comment'          : description, 
                              'tags'             : category,
                              'publisher'        : publisher,
                              'language'         : language
                             }
    remove_tags_before     = dict(name='article')
    remove_tags_after      = [dict(name='article')]
    remove_tags			   = [dict(name='div', attrs={'class':'cmtLinks'}),
                              dict(name='div', attrs={'class':'raltedTopics'}),
                              dict(name='div', attrs={'class':'editorsPick'}),
                              dict(name='div', attrs={'class':'articleImg etSpecial'}),
                              dict(name='div', attrs={'class':'articleImg artAd'}),
                              dict(name='div', attrs={'class':'appPromotion'}) 
                             ]
    remove_attributes      = ['xmlns']
    feeds                  = [(u'Top Stories', u'http://economictimes.indiatimes.com/rssfeedstopstories.cms'),
                              (u'News', u'http://economictimes.indiatimes.com/News/rssfeeds/1715249553.cms'),
                              (u'Market', u'http://economictimes.indiatimes.com/Markets/markets/rssfeeds/1977021501.cms'),
                              (u'Personal Finance', u'http://economictimes.indiatimes.com/rssfeeds/837555174.cms'),
                              (u'Infotech', u'http://economictimes.indiatimes.com/Infotech/rssfeeds/13357270.cms'),
                              (u'Job', u'http://economictimes.indiatimes.com/Infotech/rssfeeds/107115.cms'),
                              (u'Opinion', u'http://economictimes.indiatimes.com/opinion/opinionshome/rssfeeds/897228639.cms'),
                              (u'Features', u'http://economictimes.indiatimes.com/Features/etfeatures/rssfeeds/1466318837.cms'),
                              (u'Environment', u'http://economictimes.indiatimes.com/rssfeeds/2647163.cms'),
                              (u'NRI', u'http://economictimes.indiatimes.com/rssfeeds/7771250.cms')
                            ]



    #Uses the mobile print version. For web print version use 'http://economictimes.indiatimes.com/articleshow/<article_id>?prtpage=1'
    def print_version(self, url):
        rest, sep, article_id = url.rpartition('/articleshow/')
        #return 'http://m.economictimes.com/PDAET/articleshow/' + article_id
        return 'http://economictimes.indiatimes.com/articleshow/' + article_id+ '?prtpage=1'

    def get_article_url(self, article):
        rurl = article.get('link',  None)
        if (rurl.find('/quickieslist/') > 0) or (rurl.find('/quickiearticleshow/') > 0):
            return None
        return rurl

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        return soup

    def postprocess_html(self, soup, first_fetch):
        return self.adeify_images(soup)

Last edited by PeterT; 01-31-2016 at 06:51 PM. Reason: Code was unreadable; changed to code tags to preserve spacing
Sambit is offline   Reply With Quote
Old 04-30-2016, 04:04 PM   #12
Sambit
Junior Member
Sambit began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jan 2016
Device: Kindle
Economic Times Recipe Broken Again and fixed now.

Economic Times again changed its formats so the recipe got broken.

Recipe below:

Code:
__license__   = 'GPL v3'
__copyright__ = '2008-2014, Karthik <hashkendistro@gmail.com>, Darko Miletic <darko.miletic at gmail.com>'
'''
economictimes.indiatimes.com
'''


from calibre.web.feeds.news import BasicNewsRecipe

class TheEconomicTimes(BasicNewsRecipe):
    title                  = 'The Economic Times India'
    __author__             = 'Karthik <hashkendistro@gmail.com>, Darko Miletic <darko.miletic at gmail.com>'
    description            = 'Financial news from India'
    publisher              = 'economictimes.indiatimes.com'
    category               = 'news, finances, politics, India'
    oldest_article         = 1
    max_articles_per_feed  = 100
    no_stylesheets         = True
    use_embedded_content   = False
    simultaneous_downloads = 1
    encoding               = 'utf-8'
    language               = 'en_IN'
    publication_type       = 'newspaper'
    masthead_url           = 'http://economictimes.indiatimes.com/photo/2676871.cms'
    extra_css              = """
                                 body{font-family: Arial,Helvetica,sans-serif}
                                 .foto_mg{font-size: 60%;
                                          font-weight: 700;}
                                 h1{font-size: 150%;}
                                 artdate{font-size: 60%}
                                 artag{font-size: 60%}
                                 div.storycontent{padding-top: 10px}
                             """
    conversion_options     = {'comment'          : description,
                              'tags'             : category,
                              'publisher'        : publisher,
                              'language'         : language
                             }
    remove_tags_before     = dict(name='article')
    remove_tags_after      = [dict(name='article')]
    keep_only_tags		  = [dict(name='h1', attrs={'class':'title'}),
                               dict(name='div', attrs={'class':'bylineFull'}),
                               dict(name='div', attrs={'class':'articleImg'}),
                               dict(name='div', attrs={'class':'artText'})
                              ]
    remove_tags			   = [dict(name='div', attrs={'class':'cmtLinks'}),
                              dict(name='div', attrs={'class':'raltedTopics'}),
                              dict(name='div', attrs={'class':'editorsPick'}),
                              dict(name='div', attrs={'class':'articleImg etSpecial'}),
                              dict(name='div', attrs={'class':'articleImg artAd'}),
                              dict(name='div', attrs={'class':'appPromotion'})
                             ]

    remove_attributes      = ['xmlns']
    feeds                  = [(u'Top Stories', u'http://economictimes.indiatimes.com/rssfeedstopstories.cms'),
                              (u'News', u'http://economictimes.indiatimes.com/News/rssfeeds/1715249553.cms'),
                              (u'Market', u'http://economictimes.indiatimes.com/Markets/markets/rssfeeds/1977021501.cms'),
                              (u'Personal Finance', u'http://economictimes.indiatimes.com/rssfeeds/837555174.cms'),
                              (u'Infotech', u'http://economictimes.indiatimes.com/Infotech/rssfeeds/13357270.cms'),
                              (u'Job', u'http://economictimes.indiatimes.com/Infotech/rssfeeds/107115.cms'),
                              (u'Opinion', u'http://economictimes.indiatimes.com/opinion/opinionshome/rssfeeds/897228639.cms'),
                              (u'Features', u'http://economictimes.indiatimes.com/Features/etfeatures/rssfeeds/1466318837.cms'),
                              (u'Environment', u'http://economictimes.indiatimes.com/rssfeeds/2647163.cms'),
                              (u'NRI', u'http://economictimes.indiatimes.com/rssfeeds/7771250.cms')
                            ]

    # Uses the mobile print version. For web print version use 'http://economictimes.indiatimes.com/articleshow/<article_id>?prtpage=1'
    def print_version(self, url):
        rest, sep, article_id = url.rpartition('/articleshow/')
        # return 'http://m.economictimes.com/PDAET/articleshow/' + article_id
        return 'http://economictimes.indiatimes.com/articleshow/' + article_id+ '?prtpage=1'

    def get_article_url(self, article):
        rurl = article.get('link',  None)
        if (rurl.find('/quickieslist/') > 0) or (rurl.find('/quickiearticleshow/') > 0):
            return None
        return rurl

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        return soup

    def postprocess_html(self, soup, first_fetch):
        return self.adeify_images(soup)
Sambit is offline   Reply With Quote
Reply

Tags
convert html to epub, recipe

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
import HTML as new book - missing content.opf? sumguy Editor 2 03-02-2014 07:55 AM
Step-By-Step Guide to ePub creation ghostyjack ePub 22 07-16-2012 11:23 AM
Calibre Catalog Creation & Kindle 3. What am I missing? GeekyGal Introduce Yourself 3 11-10-2010 09:55 PM
Missing covers, missing content. Getting worse with each sync. Mememememe Kobo Reader 7 06-16-2010 09:02 AM
if:book releases alpha version of Sophie, content creation tool sic News 8 04-12-2007 02:28 PM


All times are GMT -4. The time now is 11:47 AM.


MobileRead.com is a privately owned, operated and funded community.