|
|
#1 |
|
Member
![]() Posts: 22
Karma: 10
Join Date: Aug 2015
Device: Kobo Aura H2O
|
Reddit feed with comments
Hello, I thought that I could set up a Reddit feed to get the top results for the past week for a key phrase. I used the basic feature in Calibre to get the feed and the original post but it doesn't capture the other users' comments. Any tips on what I should change?
I've put the RSS feed into Feedburner as well but makes no difference using http://feeds.feedburner.com/Redditco...esults-Testing Thanks Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1542031690(BasicNewsRecipe):
title = 'Reddit testing'
oldest_article = 7
max_articles_per_feed = 100
auto_cleanup = False
feeds = [
('Reddit testing', 'https://www.reddit.com/search.xml?q=testing&sort=top&t=week'),
]
Last edited by Phoebus; 11-12-2018 at 12:19 PM. |
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
does the rss feed actually include the comments? If not you would need to get your recipe to scrape the actual reddit website.
|
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Member
![]() Posts: 22
Karma: 10
Join Date: Aug 2015
Device: Kobo Aura H2O
|
No it doesn't. Thanks I did not realise, I wasn't sure if it scraped the RSS or used the RSS as a source of links like this feed http://feeds.feedburner.com/CrackedRSS/ used in this recipe.
That recipe uses feeds = [(u'Articles', u'http://feeds.feedburner.com/CrackedRSS/')] but changing it to format this way didn't help. Last edited by Phoebus; 11-13-2018 at 07:30 AM. |
|
|
|
|
|
#4 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
the field use_embedded_content in the recipe controls whether content is read from the feed or the linked page is scraped.
|
|
|
|
|
|
#5 |
|
Member
![]() Posts: 22
Karma: 10
Join Date: Aug 2015
Device: Kobo Aura H2O
|
Thanks
|
|
|
|
| Advert | |
|
|
|
|
#6 |
|
Member
![]() Posts: 22
Karma: 10
Join Date: Aug 2015
Device: Kobo Aura H2O
|
Thanks again for your help. Here is an Alpha version of the code. Bugs:
Usage: you must get your links as per these guides https://www.reddit.com/wiki/rss or https://www.reddit.com/r/pathogendav...ss_and_reddit/ For example I use it as a search to get results for horror stories, but you can use it for any search, subreddit, post, comments or users as per the links above. I've set it for a weekly search but obviously you can change this. Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1542030622(BasicNewsRecipe):
title = 'Reddit weekly - alpha'
auto_cleanup = False
__author__ = 'phoebus'
language = 'en'
description = "Tales from the internet"
publisher = 'Reddit users'
oldest_article =7 # days - change as required
max_articles_per_feed = 50 # change as required
no_stylesheets = True
encoding = 'utf-8'
remove_javascript = True
use_embedded_content = False
recursions = 11
remove_attributes = ['size', 'style']
feeds = [
(u'Articles', u'INSERT YOUR RSS LINK),
] # see https://www.reddit.com/wiki/rss or https://www.reddit.com/r/pathogendavid/comments/tv8m9/pathogendavids_guide_to_rss_and_reddit/'
conversion_options = {
'comment': description, 'tags': category, 'publisher': publisher, 'language': language
}
keep_only_tags = [
dict(name='p', attrs={'class': [
'title',
]}),
dict(name='span', attrs={'class': [
'domain',
]}),
dict(name='div', attrs={'class': [
'expando',
]}),
dict(name='h1', attrs={'class': [
'hover redditname',
]}),
dict(name='meta', attrs={'property': [
'og:title',
]}),
dict(name='meta', attrs={'title'}),
dict(name='div', attrs={'class': [
'entry unvoted',
'usertext-body may-blank-within md-container ',
'usertext-body may-blank-within md-container',
'md',
]}),
dict(name='div', attrs={'data-test-id': [
'post-content',
]}),
dict(name='div', attrs={'class': [
's10usnt7-0 gxtxxZ'
]}),
]
remove_tags = [
dict(name='button'),
dict(name='span', attrs={'class': [
'flair',
'flair ',
]}),
dict(name='div', attrs={'data-author': [
'AutoModerator',
]}),
dict(name='ul', attrs={'class': [
'flat-list buttons',
]}),
dict(name='input', attrs={'type': [
'hidden',
]}),
dict(name='svg'),
]
def is_link_wanted(self, url, a):
return a['class'] == 'next' and a.findParent('nav', attrs={'class':'PaginationContent'}) is not None
def postprocess_html(self, soup, first_fetch):
for div in soup.findAll(attrs={'data-author':'AutoModerator'}):
div.extract()
return soup
Last edited by Phoebus; 11-19-2018 at 05:24 AM. |
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| newbie guide - how can i quickly add custom rss feeds e.g reddit | wakkaday | Recipes | 0 | 07-23-2017 04:34 PM |
| Reddit recipe | oCkz7bJ_ | Recipes | 0 | 08-06-2016 06:12 AM |
| Reddit RSS feed not pulling author info | jasonfedelem | Recipes | 3 | 12-11-2014 12:28 AM |
| Free Kindle ebook lists on Reddit | carld | Deals and Resources (No Self-Promotion or Affiliate Links) | 1 | 03-28-2013 12:29 AM |
| Sci-Fi Author to Answer Reddit Questions | Moejoe | News | 1 | 04-07-2009 05:25 PM |