I'm creating Bloomberg Quint Recipe, because I couldn't find one.
I'm facing two issues:
1. The site have 6 section in its website at
https://www.bloombergquint.com/ markets, business, politics, global-economics, technology, pursuits. Each section have 3 stories. But when I run my recipe using
Code:
ebook-convert Bloomberg_Quint.recipe .epub --test -vv --debug-pipeline debug
I'm only seeing 4 stories in output mobi file, that too only from first 2 sections. Not sure if I'm missing something in my recipe.
2. Some of the articles in that website has charts and tables. These are embedded as html inside html. I'm not seeing these charts and tables in output file. Is there way I can process this. Ex. an article at
https://www.bloombergquint.com/mutua...igh-in-october
Code:
Code:
from __future__ import with_statement
__license__ = 'GPL 3'
__copyright__ = '2018, yetanothernerdk'
from calibre.web.feeds.news import BasicNewsRecipe
class BloombergQuint(BasicNewsRecipe):
title = u'Bloomberg Quint'
language = 'en_IN'
encoding = 'utf8'
oldest_article = 1
__author__ = 'yetanothernerdk'
max_articles_per_feed = 30
no_stylesheets = True
remove_attributes = ['style']
ignore_duplicate_articles = {'title', 'url'}
keep_only_tags = [
dict(id=lambda x: x and x.startswith('card-')),
]
remove_tags = [dict(name='div', attrs={'class': 'story-element story-element-text story-element-text-also-read'})]
def preprocess_html(self, soup):
for img in soup.findAll('img', attrs={'data-src-template': True}):
img['src'] = img['data-src-template'].replace('BINARY/thumbnail', 'alternates/FREE_660')
return soup
def articles_from_soup(self, soup):
articles = []
for article in soup.findAll(['h3']):
article = article.find(['a'])
title = self.tag_to_string(article)
url = article.get('href', False)
if not url or not title:
continue
self.log('News:', article)
articles.append({
'title': title,
'url': 'https://www.bloombergquint.com'+url,
'description': '',
'date': ''})
return articles
def parse_index(self):
soup = self.index_to_soup('https://www.bloombergquint.com/')
section_id = "stack__with__articles stack__with__articles--qsection-"
sections = ['markets', 'business', 'politics', 'global-economics', 'technology', 'pursuits']
feeds = []
for section in sections:
self.log('Section:', section.capitalize())
class_id = section_id + section
nav_div = soup.find("div", {"class": class_id})
articles = self.articles_from_soup(nav_div)
if articles:
feeds.append((section.capitalize(), articles))
for section in feeds:
self.log('Section:', section)
return feeds
For help I'm following guide from
here & API Documentation from
here
Thank You.