MobileRead Forums - View Single Post

FacetiousKnave · 01-08-2023, 05:10 PM

My coding skills are rather deplorable.

I might have re-written this code 50 times but it never works.

The recipe should be quite simple. These are the premises.

1. RSS FEED source: https://aeon.co/feed.rss
2. Exclude "videos".
3. Output EPUB

PHP Code:


			
import re



from calibre.ebooks.conversion.plumber import Plumber

from calibre.web.feeds.recipes import BasicNewsRecipe



class AeonRecipe(BasicNewsRecipe):

    title = 'Aeon'

    __author__ = 'FacetiousKnave'

    description = 'This recipe fetches articles from Aeon and outputs an EPUB file'

    use_embedded_content = False

    remove_tags = [

        dict(name='iframe')

    ]



    def parse_index(self):

        items = self.index_to_soup('https://aeon.co/feed.rss').find_all('item')

        for item in items:

            title = item.title.text

            if 'video' not in title.lower():

                url = item.link.text

                date = item.pubdate.text

                self.add_article(title, url, date, text=self.fetch_article(url))

                

    def postprocess_html(self, soup, first_fetch):

        # Remove unwanted tags

        for tag in self.remove_tags:

            for t in soup.find_all(**tag):

                t.decompose()

        return soup

What am I doing wrong?

01-08-2023, 05:10 PM	#1
FacetiousKnave Unconscionable Posts: 89 Karma: 25000 Join Date: Sep 2022 Location: Helsinki Device: Kindle	Trying to write a recipe for Aeon.com My coding skills are rather deplorable. I might have re-written this code 50 times but it never works. The recipe should be quite simple. These are the premises. 1. RSS FEED source: https://aeon.co/feed.rss 2. Exclude "videos". 3. Output EPUB PHP Code: import re from calibre.ebooks.conversion.plumber import Plumber from calibre.web.feeds.recipes import BasicNewsRecipe class AeonRecipe(BasicNewsRecipe): title = 'Aeon' __author__ = 'FacetiousKnave' description = 'This recipe fetches articles from Aeon and outputs an EPUB file' use_embedded_content = False remove_tags = [ dict(name='iframe') ] def parse_index(self): items = self.index_to_soup('https://aeon.co/feed.rss').find_all('item') for item in items: title = item.title.text if 'video' not in title.lower(): url = item.link.text date = item.pubdate.text self.add_article(title, url, date, text=self.fetch_article(url)) def postprocess_html(self, soup, first_fetch): # Remove unwanted tags for tag in self.remove_tags: for t in soup.find_all(**tag): t.decompose() return soup What am I doing wrong?