Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-12-2018, 09:10 AM   #1
Phoebus
Member
Phoebus began at the beginning.
 
Posts: 22
Karma: 10
Join Date: Aug 2015
Device: Kobo Aura H2O
Reddit feed with comments

Hello, I thought that I could set up a Reddit feed to get the top results for the past week for a key phrase. I used the basic feature in Calibre to get the feed and the original post but it doesn't capture the other users' comments. Any tips on what I should change?

I've put the RSS feed into Feedburner as well but makes no difference using http://feeds.feedburner.com/Redditco...esults-Testing

Thanks

Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1542031690(BasicNewsRecipe):
    title          = 'Reddit testing'
    oldest_article = 7
    max_articles_per_feed = 100
    auto_cleanup   = False

    feeds          = [
        ('Reddit testing', 'https://www.reddit.com/search.xml?q=testing&sort=top&t=week'),
    ]

Last edited by Phoebus; 11-12-2018 at 11:19 AM.
Phoebus is offline   Reply With Quote
Old 11-12-2018, 11:00 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,748
Karma: 22446736
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
does the rss feed actually include the comments? If not you would need to get your recipe to scrape the actual reddit website.
kovidgoyal is offline   Reply With Quote
Old 11-13-2018, 06:22 AM   #3
Phoebus
Member
Phoebus began at the beginning.
 
Posts: 22
Karma: 10
Join Date: Aug 2015
Device: Kobo Aura H2O
No it doesn't. Thanks I did not realise, I wasn't sure if it scraped the RSS or used the RSS as a source of links like this feed http://feeds.feedburner.com/CrackedRSS/ used in this recipe.

That recipe uses feeds = [(u'Articles', u'http://feeds.feedburner.com/CrackedRSS/')] but changing it to format this way didn't help.

Last edited by Phoebus; 11-13-2018 at 06:30 AM.
Phoebus is offline   Reply With Quote
Old 11-13-2018, 09:16 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,748
Karma: 22446736
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
the field use_embedded_content in the recipe controls whether content is read from the feed or the linked page is scraped.
kovidgoyal is offline   Reply With Quote
Old 11-13-2018, 03:58 PM   #5
Phoebus
Member
Phoebus began at the beginning.
 
Posts: 22
Karma: 10
Join Date: Aug 2015
Device: Kobo Aura H2O
Thanks
Phoebus is offline   Reply With Quote
Old 11-16-2018, 06:43 AM   #6
Phoebus
Member
Phoebus began at the beginning.
 
Posts: 22
Karma: 10
Join Date: Aug 2015
Device: Kobo Aura H2O
Thanks again for your help. Here is an Alpha version of the code. Bugs:
  • a subreddit's automoderator rules will appear at the start of each post
  • in page links to images not pulled in (though may be for the best) eg those to imgur, i.reddit
  • some of the code is junk as I've cannibalised from other recipes and may not need to be there
  • subreddit name is not displayed in title

Usage: you must get your links as per these guides https://www.reddit.com/wiki/rss or https://www.reddit.com/r/pathogendav...ss_and_reddit/

For example I use it as a search to get results for horror stories, but you can use it for any search, subreddit, post, comments or users as per the links above.

I've set it for a weekly search but obviously you can change this.

Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1542030622(BasicNewsRecipe):
    title          = 'Reddit weekly - alpha'
    auto_cleanup   = False
    __author__ = 'phoebus'
    language = 'en'
    description = "Tales from the internet"
    publisher = 'Reddit users'
    oldest_article =7  # days - change as required
    max_articles_per_feed = 50 # change as required
    no_stylesheets = True
    encoding = 'utf-8'
    remove_javascript = True
    use_embedded_content = False
    recursions = 11
    remove_attributes = ['size', 'style']


    feeds          = [
        (u'Articles', u'INSERT YOUR RSS LINK),
    ] # see https://www.reddit.com/wiki/rss or https://www.reddit.com/r/pathogendavid/comments/tv8m9/pathogendavids_guide_to_rss_and_reddit/'
    
    
    conversion_options = {
        'comment': description, 'tags': category, 'publisher': publisher, 'language': language
    }

    keep_only_tags = [  
                    dict(name='p', attrs={'class': [
                                                'title',
                                                            ]}),
                    dict(name='span', attrs={'class': [
                                                'domain',
                                                            ]}),                    
                    dict(name='div', attrs={'class': [
                                                'expando',
                                                            ]}),  
                    dict(name='h1', attrs={'class': [
                                                'hover redditname',
                                                            ]}),
                    dict(name='meta', attrs={'property': [
                                                'og:title',                                 
                                                            ]}),
                    dict(name='meta', attrs={'title'}),
                    dict(name='div', attrs={'class': [

                                                'entry unvoted',
                                                'usertext-body may-blank-within md-container ',
                                                'usertext-body may-blank-within md-container',  
                                                'md',                                                                     
                                                            ]}),
                    dict(name='div', attrs={'data-test-id': [
                                                'post-content',                                   
                                                            ]}), 
                    dict(name='div', attrs={'class': [
												's10usnt7-0 gxtxxZ'
                                                            ]}), 
                      ]

    remove_tags = [

		        dict(name='button'),
		        dict(name='span', attrs={'class': [
        									'flair',
        									'flair ',
        													]}),
		        dict(name='div', attrs={'data-author': [
        									'AutoModerator',
        													]}),  
		        dict(name='ul', attrs={'class': [
        									'flat-list buttons',
        													]}),        													
		        dict(name='input', attrs={'type': [
        									'hidden',
        													]}),   
 		        dict(name='svg'),
    				]


    def is_link_wanted(self, url, a):
        return a['class'] == 'next' and a.findParent('nav', attrs={'class':'PaginationContent'}) is not None

    def postprocess_html(self, soup, first_fetch):
        for div in soup.findAll(attrs={'data-author':'AutoModerator'}):
            div.extract()
        return soup

Last edited by Phoebus; 11-19-2018 at 04:24 AM.
Phoebus is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
newbie guide - how can i quickly add custom rss feeds e.g reddit wakkaday Recipes 0 07-23-2017 03:34 PM
Reddit recipe oCkz7bJ_ Recipes 0 08-06-2016 05:12 AM
Reddit RSS feed not pulling author info jasonfedelem Recipes 3 12-10-2014 11:28 PM
Free Kindle ebook lists on Reddit carld Deals and Resources (No Self-Promotion or Affiliate Links) 1 03-27-2013 11:29 PM
Sci-Fi Author to Answer Reddit Questions Moejoe News 1 04-07-2009 04:25 PM


All times are GMT -4. The time now is 07:14 AM.


MobileRead.com is a privately owned, operated and funded community.