Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 09-08-2015, 08:33 AM   #1
Phoebus
Member
Phoebus began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Aug 2015
Device: Kobo Aura H2O
Responsive images and the Stylist

I'm creating my first Calibre recipe but the site I'm trying to compile is causing me two problems.

1 - it has responsive images 2 - something is causing it to cut off large parts of the article

I have looked around and cobbled together an RSS feed. The recipe (which I believe is Python based) finds the articles but I can't get any images and the cut off point isn't consistent (ie if I run it again it may cut off the article at a different point).

I looked at using Beautifulsoup but that doesn't seem to have helped. Any pointers welcome, the recipe is:
Code:
    #!/usr/bin/env python2
    # vim:fileencoding=utf-8
    from __future__ import unicode_literals, division, absolute_import, print_function
    from calibre.web.feeds.news import BasicNewsRecipe
    from calibre.ebooks.BeautifulSoup import BeautifulSoup, Tag
    import re
    import urllib
    
    class AdvancedUserRecipe1439386553(BasicNewsRecipe):
        title          = 'The Stylist'
        oldest_article = 7
        max_articles_per_feed = 100
        auto_cleanup   = True
        auto_cleanup_keep = '//div[@class="inline-image inline-image--full inline-image--center"]'
        auto_cleanup_keep = '//section[@class="widget widget--html"]'
        feeds          = [
            ('Stylist  People', 'http://feed43.com/2568072464117534.xml'),
        ]
    
        def preprocess_html(self, soup):
    
             for img in soup.findAll('img'):
                 img.img['src'] = img.find('source',attrs={'sizes':'(max-width: 1023px) 100vw'})['srcset']
                img.replaceWith(img.img)
            return soup
This is just the last in a few iterations of the recipe, so has things like img.img and the like that I'd tried.
Phoebus is offline   Reply With Quote
Old 09-08-2015, 08:53 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 33,753
Karma: 10215946
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
There are many examples of recipes that deal with responsive images.

First you need to remove auto_cleanup and set
use_embedded_content = False

Then examine the downloaded html, find the responsive images, which typically have something like a data-src or srcset attribute with the real image url and dummy src attribute. Once you understand the structure, write preprocess_html to fix the images. See for example the preprocess_html function in the CNN or National Geographic recipes.
kovidgoyal is online now   Reply With Quote
Advert
Old 09-09-2015, 05:19 AM   #3
Phoebus
Member
Phoebus began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Aug 2015
Device: Kobo Aura H2O
Thank you kovidgoyal, I'll do that.
Phoebus is offline   Reply With Quote
Reply

Tags
recipe, responsive

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
HD+ screen less responsive? catsknit Nook Color & Nook Tablet 9 08-31-2014 10:26 AM
Screen non responsive silverraven Barnes & Noble NOOK 5 10-18-2012 03:19 PM
IQ Pocketbook IQ non-responsive citac PocketBook 5 01-13-2012 04:13 PM
non-responsive eInk? alex_edge enTourage Archive 3 02-22-2011 02:58 PM


All times are GMT -4. The time now is 01:59 AM.


MobileRead.com is a privately owned, operated and funded community.