View Single Post
Old 09-08-2015, 07:33 AM   #1
Phoebus
Member
Phoebus began at the beginning.
 
Posts: 22
Karma: 10
Join Date: Aug 2015
Device: Kobo Aura H2O
Responsive images and the Stylist

I'm creating my first Calibre recipe but the site I'm trying to compile is causing me two problems.

1 - it has responsive images 2 - something is causing it to cut off large parts of the article

I have looked around and cobbled together an RSS feed. The recipe (which I believe is Python based) finds the articles but I can't get any images and the cut off point isn't consistent (ie if I run it again it may cut off the article at a different point).

I looked at using Beautifulsoup but that doesn't seem to have helped. Any pointers welcome, the recipe is:
Code:
    #!/usr/bin/env python2
    # vim:fileencoding=utf-8
    from __future__ import unicode_literals, division, absolute_import, print_function
    from calibre.web.feeds.news import BasicNewsRecipe
    from calibre.ebooks.BeautifulSoup import BeautifulSoup, Tag
    import re
    import urllib
    
    class AdvancedUserRecipe1439386553(BasicNewsRecipe):
        title          = 'The Stylist'
        oldest_article = 7
        max_articles_per_feed = 100
        auto_cleanup   = True
        auto_cleanup_keep = '//div[@class="inline-image inline-image--full inline-image--center"]'
        auto_cleanup_keep = '//section[@class="widget widget--html"]'
        feeds          = [
            ('Stylist  People', 'http://feed43.com/2568072464117534.xml'),
        ]
    
        def preprocess_html(self, soup):
    
             for img in soup.findAll('img'):
                 img.img['src'] = img.find('source',attrs={'sizes':'(max-width: 1023px) 100vw'})['srcset']
                img.replaceWith(img.img)
            return soup
This is just the last in a few iterations of the recipe, so has things like img.img and the like that I'd tried.
Phoebus is offline   Reply With Quote