I'm creating my first Calibre recipe but the site I'm trying to compile is causing me two problems.
1 - it has responsive images 2 - something is causing it to cut off large parts of the article
I have looked around and cobbled together an RSS feed. The recipe (which I believe is Python based) finds the articles but I can't get any images and the cut off point isn't consistent (ie if I run it again it may cut off the article at a different point).
I looked at using Beautifulsoup but that doesn't seem to have helped. Any pointers welcome, the recipe is:
Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, Tag
import re
import urllib
class AdvancedUserRecipe1439386553(BasicNewsRecipe):
title = 'The Stylist'
oldest_article = 7
max_articles_per_feed = 100
auto_cleanup = True
auto_cleanup_keep = '//div[@class="inline-image inline-image--full inline-image--center"]'
auto_cleanup_keep = '//section[@class="widget widget--html"]'
feeds = [
('Stylist People', 'http://feed43.com/2568072464117534.xml'),
]
def preprocess_html(self, soup):
for img in soup.findAll('img'):
img.img['src'] = img.find('source',attrs={'sizes':'(max-width: 1023px) 100vw'})['srcset']
img.replaceWith(img.img)
return soup
This is just the last in a few iterations of the recipe, so has things like img.img and the like that I'd tried.