|
|
#1 |
|
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 44
Karma: 3034
Join Date: Mar 2012
Device: Boox Note Air 2 Plus, Samsung Galaxy S23 (base), Samsung Galaxy Tab S3
|
Greetings! I would be very grateful for some assistance with this recipe. I am trying to create a recipe that will get the next N number of days from a Bible reading website (so I'm not concerned with historical data, only future data). The end goal is to be able to have a week or so of readings always available offline on my eink device so I can read when away from wifi.
The site in question (morning readings example) uses Javascript to load the content, so I had to find the correct API call (via watching the sent headers), which gives me this API JSON result. Since there is no RSS feed for this site I have to create the feed list by iterating through dates. Through a mix of pouring over the Caliber examples and some Google AI assistance I'm partway there, but the below code is wrong in some way that is beyond me. I think parse_feeds is the function to use, and I am successfully getting at the data I want, but I'm running into errors that are outside of (but most certainly caused by) my code. My best guess is that I'm not passing data the right way and/or I should be using a different function. Within the JSON tree (of which there is one tree per day) I am only interested in entries for Services->Morning Prayer->full and Services->Evening Prayer->full. You'll also note in my code that I'm excluding items with a cycle value of 30. I also may end up needing some additional cleanup of the resulting html, but at the moment I'm just focused on getting the code working without errors. Any help is greatly appreciated! ![]() Code:
import json
import string, re # Brings in two powerful built-in modules: string for common string constants (like all lowercase/uppercase letters, digits, punctuation) and re (regular expressions) for advanced pattern matching and manipulation, letting you find, split, or replace text based on complex rules, with re being especially useful for text searching and data extraction.
from datetime import datetime, timedelta
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
from urllib.parse import urlparse, urlsplit
from contextlib import closing
#from calibre.web.feeds import Feed, feed_from_xml, feeds_from_index, templates
class DailyOffice(BasicNewsRecipe):
title = 'The Daily Office Readings'
__author__ = 'Anglican Church in North America'
description = 'ACNA Book of Common Prayer Daily Readings'
remove_tags = [dict(attrs={'class':['el-switch', 'asterisk']}),
dict(name=['script', 'noscript', 'style'])]
days_number = 6
max_articles_per_feed = days_number
# https://api.dailyoffice2019.com/api/v1/readings/2025-12-21?absolution=lay&bible_translation=nasb&canticle_rotation=default&chrysostom=on&collects=rotating&confession=long-on-fast&ep_great_litany=ep_litany_off&family-creed=family-creed-no&family-opening-sentence=family-opening-sentence-fixed&family_collect=time_of_day&family_reading_audio=off&family_readings=brief&format=json&general_thanksgiving=on&grace=rotating&language_style=contemporary&language_style_for_our_father=traditional&lectionary=daily-office-readings&morning_prayer_invitatory=invitatory_traditional&mp_great_litany=mp_litany_off&national_holidays=us&o_antiphons=literal&psalm_style=whole_verse&psalm_translation=contemporary&psalms=contemporary&psalter=60&reading_audio=off&reading_cycle=1&reading_headings=on&reading_length=full&style=unison&suffrages=rotating&translation=nasb
daily_office_settings = '?absolution=lay&bible_translation=nasb&canticle_rotation=default&chrysostom=on&collects=rotating&confession=long-on-fast&ep_great_litany=ep_litany_off&family-creed=family-creed-no&family-opening-sentence=family-opening-sentence-fixed&family_collect=time_of_day&family_reading_audio=off&family_readings=brief&format=json&general_thanksgiving=on&grace=rotating&language_style=contemporary&language_style_for_our_father=traditional&lectionary=daily-office-readings&morning_prayer_invitatory=invitatory_traditional&mp_great_litany=mp_litany_off&national_holidays=us&o_antiphons=literal&psalm_style=whole_verse&psalm_translation=contemporary&psalms=contemporary&psalter=60&reading_audio=off&reading_cycle=1&reading_headings=on&reading_length=full&style=unison&suffrages=rotating&translation=nasb'
print('BEGIN!!!')
# EDIT: I'm actually not sure about this section either, now. Upon further digging it seems like this is creating multiple feeds, when really I just want to make ONE feed with a list of urls to follow. I can create that array just fine, but I cannot seem to figure out what I am supposed to pass it to so that Calibre will then process the array of links like a feed list. I am trying a bunch of different things so far and none work out.
# Override get_feeds to generate links programmatically
def get_feeds(self):
feeds = []
today = datetime.now()
print('GET FEED LIST!!!')
# Generate URLs for the configured number of days
for i in range(self.days_number):
current_date = today + timedelta(days=i)
date_str = current_date.strftime('%Y-%m-%d') # Format the date into the URL format required by the website
# Full Day.
url = 'https://api.dailyoffice2019.com/api/v1/readings/{}'.format(date_str) + self.daily_office_settings
feed_title = current_date.strftime('Daily Prayer Readings for %B %d, %Y') # Create a unique title for each feed item
# feed_title += ' (' + url + ')' # For Debugging.
feeds.append((feed_title, url)) # Append the feed as a tuple: (title, url)
# print('GETTING: ' + feed_title + ': ' + url)
return feeds # The BasicNewsRecipe's parse_feeds will then process each URL in the list
# https://manual.calibre-ebook.com/_modules/calibre/web/feeds/news.html#BasicNewsRecipe.parse_feeds
def parse_feeds(self):
# Create a list of articles from the list of feeds returned by :meth:`BasicNewsRecipe.get_feeds`.
# Return a list of :class:`Feed` objects.
feeds = self.get_feeds()
parsed_feeds = []
br = self.browser
i = 0
for obj in feeds:
i += 1
print('CURRENTLY PARSING: ')
print(i)
if isinstance(obj, (str, bytes)):
title, url = None, obj
else:
title, url = obj
if isinstance(title, bytes):
title = title.decode('utf-8')
if isinstance(url, bytes):
url = url.decode('utf-8')
if url.startswith('feed://'):
url = 'http'+url[4:]
# self.report_progress(0, _('FETCHING FEED: ')+f' {title if title else url}...')
self.report_progress(0, _('FETCHING FEED: ')+f' {title} {url}...')
# try:
purl = urlparse(url, allow_fragments=False)
if purl.username or purl.password:
hostname = purl.hostname
if purl.port:
hostname += f':{purl.port}'
url = purl._replace(netloc=hostname).geturl()
if purl.username and purl.password:
br.add_password(url, purl.username, purl.password)
with closing(br.open_novisit(url, timeout=self.timeout)) as f:
raw = f.read()
print('NEW JSON!!!')
json_data = json.loads(raw.decode('utf-8')) # Decode and parse the JSON string
# print(json_data)
new_feed_content = f"<h1>Morning Prayer</h1>"
print(new_feed_content)
morning_prayer = json_data.get("services", {}).get("Morning Prayer", {}).get("readings", [])
# print('MORNING PRAYER: ')
# print(morning_prayer)
for item in morning_prayer:
full = item.get("full", {})
# print(full)
if full.get("cycle") != "30": # Skip 30 Day Cycle Items.
# print('NAME (morning): ' + full.get('name'))
new_feed_content += f"<h2>{full.get('name')}</h2>"
text = full.get("text")
text = text.replace("<html><head></head><body>", "")
text = text.replace("</body></html>", "")
text = text.replace("\\", "")
# print('TEXT (morning): ' + text)
new_feed_content += f"{text}"
evening_prayer = json_data.get("services", {}).get("Evening Prayer", {}).get("readings", [])
new_feed_content += f"<h1>Evening Prayer</h1>"
for item in evening_prayer:
full = item.get("full", {})
if full.get("cycle") != "30": # Skip 30 Day Cycle Items.
# print('NAME (evening): ' + full.get('name'))
new_feed_content += f"<h2>{full.get('name')}</h2>"
text = full.get("text")
text = text.replace("<html><head></head><body>", "")
text = text.replace("</body></html>", "")
text = text.replace("\\", "")
# print('TEXT (evening): ' + text)
new_feed_content += f"{text}"
print('FEED ITEM CONTENT (morn and eve): ' + new_feed_content)
# parsed_feeds.append(new_feed_content)
# THE BELOW IS WHAT I AM MOST UNSURE ABOUT, BUT CAN'T FIND CLARITY ON WHAT TO DO DIFFERENT
parsed_feeds.append({
'title': 'Daily Prayer for... ',
'url': url,
'date': json_data.get("calendarDate", {}).get("date", {}),
'description' : 'TEST',
'content': new_feed_content
})
# except Exception as err:
# feed = Feed()
# msg = f'Failed feed: {title if title else url}'
# feed.populate_from_preparsed_feed(msg, [])
# feed.description = as_unicode(err)
# parsed_feeds.append(feed)
# self.log.exception(msg)
# delay = self.get_url_specific_delay(url)
# if delay > 0:
# time.sleep(delay)
# remove = [fl for fl in parsed_feeds if len(fl) == 0 and self.remove_empty_feeds]
# for f in remove:
# parsed_feeds.remove(f)
return parsed_feeds
Last edited by readabit; 12-20-2025 at 09:32 PM. |
|
|
|
|
|
#2 |
|
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 44
Karma: 3034
Join Date: Mar 2012
Device: Boox Note Air 2 Plus, Samsung Galaxy S23 (base), Samsung Galaxy Tab S3
|
I've cracked it!
Still have some more bugs to work out (some dates are not returning anything for some reason), but I am actually getting content now! Code:
import json
import string, re # Brings in two powerful built-in modules: string for common string constants (like all lowercase/uppercase letters, digits, punctuation) and re (regular expressions) for advanced pattern matching and manipulation, letting you find, split, or replace text based on complex rules, with re being especially useful for text searching and data extraction.
from datetime import datetime, timedelta
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
from urllib.parse import urlparse, urlsplit
from contextlib import closing
#from calibre.web.feeds import Feed, feed_from_xml, feeds_from_index, templates
from calibre.web.feeds import Article, Feed
class DailyOffice(BasicNewsRecipe):
title = 'The Daily Office Readings'
__author__ = 'Anglican Church in North America'
description = 'ACNA Book of Common Prayer Daily Readings'
#timefmt = ' [%a, %d %b, %Y]'
remove_tags = [dict(attrs={'class':['el-switch', 'asterisk']}),
dict(name=['script', 'noscript', 'style'])]
# no_stylesheets = True
#extra_css = 'h1 {font: sans-serif large;}\n.byline {font:monospace;}'
# auto_cleanup = True
# auto_cleanup_keep = '//*[@class="readingsPanel"]' # This is the key line to keep only content inside a specific class
days_number = 6
# max_articles_per_feed = days_number * 2
max_articles_per_feed = days_number
# https://api.dailyoffice2019.com/api/v1/readings/2025-12-21?absolution=lay&bible_translation=nasb&canticle_rotation=default&chrysostom=on&collects=rotating&confession=long-on-fast&ep_great_litany=ep_litany_off&family-creed=family-creed-no&family-opening-sentence=family-opening-sentence-fixed&family_collect=time_of_day&family_reading_audio=off&family_readings=brief&format=json&general_thanksgiving=on&grace=rotating&language_style=contemporary&language_style_for_our_father=traditional&lectionary=daily-office-readings&morning_prayer_invitatory=invitatory_traditional&mp_great_litany=mp_litany_off&national_holidays=us&o_antiphons=literal&psalm_style=whole_verse&psalm_translation=contemporary&psalms=contemporary&psalter=60&reading_audio=off&reading_cycle=1&reading_headings=on&reading_length=full&style=unison&suffrages=rotating&translation=nasb
daily_office_settings = '?absolution=lay&bible_translation=nasb&canticle_rotation=default&chrysostom=on&collects=rotating&confession=long-on-fast&ep_great_litany=ep_litany_off&family-creed=family-creed-no&family-opening-sentence=family-opening-sentence-fixed&family_collect=time_of_day&family_reading_audio=off&family_readings=brief&format=json&general_thanksgiving=on&grace=rotating&language_style=contemporary&language_style_for_our_father=traditional&lectionary=daily-office-readings&morning_prayer_invitatory=invitatory_traditional&mp_great_litany=mp_litany_off&national_holidays=us&o_antiphons=literal&psalm_style=whole_verse&psalm_translation=contemporary&psalms=contemporary&psalter=60&reading_audio=off&reading_cycle=1&reading_headings=on&reading_length=full&style=unison&suffrages=rotating&translation=nasb'
print('BEGIN!!!')
my_articles = []
today = datetime.now()
print('CREATE ARTICLE LIST!!!')
# Generate URLs for the configured number of days
for i in range(days_number):
print('ARTICLE LIST #')
print(i)
current_date = today + timedelta(days=i)
date_str = current_date.strftime('%Y-%m-%d') # Format the date into the URL format required by the website
# Full Day.
url = 'https://api.dailyoffice2019.com/api/v1/readings/{}'.format(date_str) + daily_office_settings
article_title = current_date.strftime('Daily Prayer Readings for %B %d, %Y') # Create a unique title for each feed item
# article_title += ' (' + url + ')' # For Debugging.
#my_articles.append((article_title, url)) # Append the feed as a tuple: (title, url)
my_articles.append({
'title' : article_title,
'url' : url
#'date' : format(date_str),
#'description' : 'Daily Prayer',
#'content' : ''
})
# print('GETTING: ' + article_title + ': ' + url)
print('ARTICLE LIST COMPLETED')
print(my_articles)
def parse_index(self):
#print(self.title)
#print(self.my_articles)
feeds = []
feeds.append((self.title, self.my_articles))
return feeds
def preprocess_raw_html(self, raw_html, url):
# The 'soup' object initially holds the raw downloaded content
#json_data = json.loads(soup.encode('utf-8')) # Decode and parse the JSON string
print('BEGIN PROCESSING!!!')
json_data = json.loads(raw_html) # Decode and parse the JSON string
# Process the JSON data and build HTML
new_html_content = "<html><body>"
morning_prayer = json_data.get("services", {}).get("Morning Prayer", {}).get("readings", [])
print('MORNING PRAYER: ')
print(morning_prayer)
new_html_content = f"<h1>Morning Prayer</h1>"
print('Beging Morning Prayer...')
for item in morning_prayer:
#new_html_content += f"<h2>{item['title']}</h2>"
full = item.get("full", {})
if full.get("cycle") != "30": # Skip 30 Day Cycle Items.
print('NAME (morning): ' + full.get('name'))
new_html_content += f"<h2>{full.get('name')}</h2>"
text = full.get("text")
text = text.replace("<html><head></head><body>", "")
text = text.replace("</body></html>", "")
#html.unescape(element)
text = text.replace("\\", "")
print('TEXT (morning): ' + text)
new_html_content += f"{text}"
evening_prayer = json_data.get("services", {}).get("Evening Prayer", {}).get("readings", [])
new_html_content += f"<h1>Evening Prayer</h1>"
for item in evening_prayer:
full = item.get("full", {})
if full.get("cycle") != "30": # Skip 30 Day Cycle Items.
print('NAME (evening): ' + full.get('name'))
new_html_content += f"<h2>{full.get('name')}</h2>"
text = text = full.get("text")
text = text.replace("<html><head></head><body>", "")
text = text.replace("</body></html>", "")
#html.unescape(element)
text = text.replace("\\", "")
print('TEXT (evening): ' + text)
new_html_content += f"{text}"
new_html_content += "</body></html>"
print('FEED ITEM CONTENT (morn and eve): ' + new_html_content)
#return BeautifulSoup(new_html_content, 'html.parser') # Return a new BeautifulSoup object with the HTML content
#return self.index_to_soup(new_html_content)
return new_html_content
|
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 937
Karma: 1004662
Join Date: Sep 2017
Location: Buenos Aires, Argentina
Device: moon+ reader, kindle paperwhite
|
Let's see if this is what you're looking for
Last edited by dunhill; Yesterday at 10:28 PM. |
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Load json which got from request in recipe | njpig | Recipes | 2 | 05-17-2024 05:26 AM |
| Calibre Content Server Json API | mathewparet | Development | 1 | 01-27-2023 10:15 AM |
| Upcoming nonprofit requesting for a recipe | amitbatra | Recipes | 5 | 02-01-2016 04:16 PM |
| Requesting a different page in a recipe | ireadtheinternet | Recipes | 3 | 10-31-2014 06:37 PM |
| content retrieval recipe | Torx | Amazon Kindle | 0 | 12-17-2010 12:05 PM |