Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-21-2017, 12:49 AM   #1
duhduhduh
Junior Member
duhduhduh began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Mar 2017
Device: KINDLE PAPERWHITE
Compress Financial Times Recipe

I am currently using the FT (International) printed edition and it often fetches in size above 50mb.

1. I have used the following codes, but the file sizes remain the same.
Code:
    useHighResImages = False
    compress_news_images = True
    compress_news_images_auto_size = 5
    scale_news_images_to_device = True
2. I will also like to know how to remove certain sections to reduce the file size?
Here's a link to the recipe.
duhduhduh is offline   Reply With Quote
Old 04-02-2017, 09:40 AM   #2
duhduhduh
Junior Member
duhduhduh began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Mar 2017
Device: KINDLE PAPERWHITE
Hi,

Code:
#!/usr/bin/env  python2
# -*- mode: python -*-
# -*- coding: utf-8 -*-

__license__ = 'GPL v3'
__copyright__ = '2010-2017, Darko Miletic <darko.miletic at gmail.com>'
'''
www.ft.com/international-edition
'''

from calibre.web.feeds.news import BasicNewsRecipe
from collections import OrderedDict
from urllib import unquote


def classes(classes):
    q = frozenset(classes.split(' '))
    return dict(attrs={
        'class': lambda x: x and frozenset(x.split()).intersection(q)})


class FinancialTimes(BasicNewsRecipe):
    title = 'Financial Times (International) printed edition'
    __author__ = 'Darko Miletic'
    description = "The Financial Times (FT) is one of the world's leading business news and information organisations, recognised internationally for its authority, integrity and accuracy."  # noqa
    publisher = 'The Financial Times Ltd.'
    category = 'news, finances, politics, World'
    oldest_article = 2
    scale_news_images_to_device = True
    language = 'en'
    max_articles_per_feed = 250
    no_stylesheets = True
    use_embedded_content = False
    needs_subscription = True
    encoding = 'utf8'
    publication_type = 'newspaper'
    handle_gzip = True
    LOGIN = 'https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F'
    LOGOUT = 'https://myaccount.ft.com/logout'
    INDEX = 'http://www.ft.com/international-edition'
    PREFIX = 'http://www.ft.com'
    useHighResImages = False
    compress_news_images = True
    compress_news_images_auto_size = 5
    excludeSections = ['life-arts']

    keep_only_tags = [
        classes('article__header--wrapper article__time-byline article__body n-content-image barrier-grid__heading')
    ]

    remove_tags = [
        classes('n-content-related-box tour-tip')
    ]

    remove_attributes = ['width', 'height', 'lang', 'style']

    def get_browser(self):
        br = BasicNewsRecipe.get_browser(self)
        br.open(self.INDEX)
        if self.username is not None and self.password is not None:
            br.open(self.LOGIN)
            br.select_form(name='enter-email-form')
            br['email'] = self.username
            br.submit()
            br.select_form(name='enter-password-form')
            br['password'] = self.password
            br.submit()
        return br

    def parse_index(self):
        feeds = OrderedDict()
        soup = self.index_to_soup(self.INDEX)
        section_title = 'Untitled'

        for column in soup.findAll('div', attrs={'class': 'feedBoxes clearfix'}):
            for section in column.findAll('div', attrs={'class': 'feedBox'}):
                sectiontitle = self.tag_to_string(section.find('h4'))
                if '...' not in sectiontitle:
                    section_title = sectiontitle
                for article in section.ul.findAll('li'):
                    articles = []
                    title = self.tag_to_string(article.a)
                    url = article.a['href']
                    articles.append(
                        {'title': title, 'url': url, 'description': '', 'date': ''})

                    if articles:
                        if section_title not in feeds:
                            feeds[section_title] = []
                        feeds[section_title] += articles

        ans = [(key, val) for key, val in feeds.iteritems()]
        return ans

    def preprocess_html(self, soup):
        for img in soup.findAll('img', srcset=True):
            src = img['srcset'].split(',')[0].strip()
            src = unquote(src.rpartition('/')[2].partition('?')[0])
            img['src'] = src
        return soup

    def cleanup(self):
        self.browser.open(self.LOGOUT)
Did I place it wrongly?
duhduhduh is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Financial Times Recipe not working teotjunk Recipes 0 01-21-2017 02:53 AM
[Recipe Request] Financial Times iloveredbull Recipes 1 06-28-2015 05:45 AM
Financial Times - alternative recipe suggestion emerson Recipes 0 03-07-2015 04:16 PM
Financial Times recipe Vs Kindle version JustinD Recipes 5 11-16-2014 06:30 AM
Update Financial Times recipe sir-archimedes Recipes 0 04-24-2011 10:39 AM


All times are GMT -4. The time now is 02:07 AM.


MobileRead.com is a privately owned, operated and funded community.