Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-10-2018, 11:34 PM   #1
hiabcwelcome
Junior Member
hiabcwelcome began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Sep 2018
Device: Kindle PW
Recipe for BloombergQuint

I'm creating Bloomberg Quint Recipe, because I couldn't find one.


I'm facing two issues:
1. The site have 6 section in its website at https://www.bloombergquint.com/ markets, business, politics, global-economics, technology, pursuits. Each section have 3 stories. But when I run my recipe using
Code:
ebook-convert Bloomberg_Quint.recipe .epub --test -vv --debug-pipeline debug
I'm only seeing 4 stories in output mobi file, that too only from first 2 sections. Not sure if I'm missing something in my recipe.

2. Some of the articles in that website has charts and tables. These are embedded as html inside html. I'm not seeing these charts and tables in output file. Is there way I can process this. Ex. an article at https://www.bloombergquint.com/mutua...igh-in-october

Code:
Code:
from __future__ import with_statement
__license__ = 'GPL 3'
__copyright__ = '2018, yetanothernerdk'

from calibre.web.feeds.news import BasicNewsRecipe


class BloombergQuint(BasicNewsRecipe):
    title = u'Bloomberg Quint'
    language = 'en_IN'
    encoding = 'utf8'
    oldest_article = 1
    __author__ = 'yetanothernerdk'
    max_articles_per_feed = 30
    no_stylesheets = True
    remove_attributes = ['style']

    ignore_duplicate_articles = {'title', 'url'}
    keep_only_tags = [
        dict(id=lambda x: x and x.startswith('card-')),
    ]

    remove_tags = [dict(name='div', attrs={'class': 'story-element story-element-text story-element-text-also-read'})]

    def preprocess_html(self, soup):
        for img in soup.findAll('img', attrs={'data-src-template': True}):
            img['src'] = img['data-src-template'].replace('BINARY/thumbnail', 'alternates/FREE_660')
        return soup

    def articles_from_soup(self, soup):
        articles = []
        for article in soup.findAll(['h3']):
            article = article.find(['a'])
            title = self.tag_to_string(article)
            url = article.get('href', False)
            if not url or not title:
                continue
            self.log('News:', article)
            articles.append({
                'title': title,
                'url': 'https://www.bloombergquint.com'+url,
                'description': '',
                'date': ''})
        return articles

    def parse_index(self):
        soup = self.index_to_soup('https://www.bloombergquint.com/')
        section_id = "stack__with__articles stack__with__articles--qsection-"
        sections = ['markets', 'business', 'politics', 'global-economics', 'technology', 'pursuits']
        feeds = []
        for section in sections:
            self.log('Section:', section.capitalize())
            class_id = section_id + section
            nav_div = soup.find("div", {"class": class_id})
            articles = self.articles_from_soup(nav_div)
            if articles:
                feeds.append((section.capitalize(), articles))
        for section in feeds:
            self.log('Section:', section)
        return feeds
For help I'm following guide from here & API Documentation from here


Thank You.

Last edited by hiabcwelcome; 11-10-2018 at 11:40 PM.
hiabcwelcome is offline   Reply With Quote
Old 11-11-2018, 01:35 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
when you use --test it will cause calibre to downoad at most 4 articles. Dont use it if you want the full set downloaded.
kovidgoyal is online now   Reply With Quote
Advert
Old 11-13-2018, 08:56 AM   #3
hiabcwelcome
Junior Member
hiabcwelcome began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Sep 2018
Device: Kindle PW
I was missing that.
Thank You Kovid.
hiabcwelcome is offline   Reply With Quote
Reply

Tags
calibre, news, recipe


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Per-recipe settings without editing the recipe? bobbysteel Recipes 3 03-05-2017 07:40 AM
Recipe for Het Laatste Nieuws (Belgian newspaper) based on built in recipe of Darko M erkfuizfeuadjfjz Recipes 0 02-17-2017 03:11 PM
Recipe voor De Tijd (Belgian newspaper) based on built in recipe of Darko Miletic erkfuizfeuadjfjz Recipes 0 02-17-2017 02:43 PM
ft recipe financial_times_us.recipe piet8stevens Recipes 3 03-05-2016 03:55 AM
Recipe works when mocked up as Python file, fails when converted to Recipe ode Recipes 7 09-04-2011 04:57 AM


All times are GMT -4. The time now is 02:41 AM.


MobileRead.com is a privately owned, operated and funded community.