|
|
#1 |
|
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Sep 2018
Device: Kindle PW
|
Recipe for BloombergQuint
I'm creating Bloomberg Quint Recipe, because I couldn't find one.
I'm facing two issues: 1. The site have 6 section in its website at https://www.bloombergquint.com/ markets, business, politics, global-economics, technology, pursuits. Each section have 3 stories. But when I run my recipe using Code:
ebook-convert Bloomberg_Quint.recipe .epub --test -vv --debug-pipeline debug 2. Some of the articles in that website has charts and tables. These are embedded as html inside html. I'm not seeing these charts and tables in output file. Is there way I can process this. Ex. an article at https://www.bloombergquint.com/mutua...igh-in-october Code: Code:
from __future__ import with_statement
__license__ = 'GPL 3'
__copyright__ = '2018, yetanothernerdk'
from calibre.web.feeds.news import BasicNewsRecipe
class BloombergQuint(BasicNewsRecipe):
title = u'Bloomberg Quint'
language = 'en_IN'
encoding = 'utf8'
oldest_article = 1
__author__ = 'yetanothernerdk'
max_articles_per_feed = 30
no_stylesheets = True
remove_attributes = ['style']
ignore_duplicate_articles = {'title', 'url'}
keep_only_tags = [
dict(id=lambda x: x and x.startswith('card-')),
]
remove_tags = [dict(name='div', attrs={'class': 'story-element story-element-text story-element-text-also-read'})]
def preprocess_html(self, soup):
for img in soup.findAll('img', attrs={'data-src-template': True}):
img['src'] = img['data-src-template'].replace('BINARY/thumbnail', 'alternates/FREE_660')
return soup
def articles_from_soup(self, soup):
articles = []
for article in soup.findAll(['h3']):
article = article.find(['a'])
title = self.tag_to_string(article)
url = article.get('href', False)
if not url or not title:
continue
self.log('News:', article)
articles.append({
'title': title,
'url': 'https://www.bloombergquint.com'+url,
'description': '',
'date': ''})
return articles
def parse_index(self):
soup = self.index_to_soup('https://www.bloombergquint.com/')
section_id = "stack__with__articles stack__with__articles--qsection-"
sections = ['markets', 'business', 'politics', 'global-economics', 'technology', 'pursuits']
feeds = []
for section in sections:
self.log('Section:', section.capitalize())
class_id = section_id + section
nav_div = soup.find("div", {"class": class_id})
articles = self.articles_from_soup(nav_div)
if articles:
feeds.append((section.capitalize(), articles))
for section in feeds:
self.log('Section:', section)
return feeds
Thank You. Last edited by hiabcwelcome; 11-11-2018 at 12:40 AM. |
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,611
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
when you use --test it will cause calibre to downoad at most 4 articles. Dont use it if you want the full set downloaded.
|
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Sep 2018
Device: Kindle PW
|
I was missing that.
Thank You Kovid. |
|
|
|
![]() |
| Tags |
| calibre, news, recipe |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Per-recipe settings without editing the recipe? | bobbysteel | Recipes | 3 | 03-05-2017 08:40 AM |
| Recipe for Het Laatste Nieuws (Belgian newspaper) based on built in recipe of Darko M | erkfuizfeuadjfjz | Recipes | 0 | 02-17-2017 04:11 PM |
| Recipe voor De Tijd (Belgian newspaper) based on built in recipe of Darko Miletic | erkfuizfeuadjfjz | Recipes | 0 | 02-17-2017 03:43 PM |
| ft recipe financial_times_us.recipe | piet8stevens | Recipes | 3 | 03-05-2016 04:55 AM |
| Recipe works when mocked up as Python file, fails when converted to Recipe | ode | Recipes | 7 | 09-04-2011 05:57 AM |