Requesting Assistance with Recipe for JSON Content

readabit · 12-20-2025, 05:37 PM

Greetings! I would be very grateful for some assistance with this recipe. I am trying to create a recipe that will get the next N number of days from a Bible reading website (so I'm not concerned with historical data, only future data). The end goal is to be able to have a week or so of readings always available offline on my eink device so I can read when away from wifi.

The site in question (morning readings example) uses Javascript to load the content, so I had to find the correct API call (via watching the sent headers), which gives me this API JSON result.

Since there is no RSS feed for this site I have to create the feed list by iterating through dates. Through a mix of pouring over the Caliber examples and some Google AI assistance I'm partway there, but the below code is wrong in some way that is beyond me. I think parse_feeds is the function to use, and I am successfully getting at the data I want, but I'm running into errors that are outside of (but most certainly caused by) my code. My best guess is that I'm not passing data the right way and/or I should be using a different function.

Within the JSON tree (of which there is one tree per day) I am only interested in entries for Services->Morning Prayer->full and Services->Evening Prayer->full. You'll also note in my code that I'm excluding items with a cycle value of 30.

I also may end up needing some additional cleanup of the resulting html, but at the moment I'm just focused on getting the code working without errors.

Any help is greatly appreciated!

Code:

import json
import string, re # Brings in two powerful built-in modules: string for common string constants (like all lowercase/uppercase letters, digits, punctuation) and re (regular expressions) for advanced pattern matching and manipulation, letting you find, split, or replace text based on complex rules, with re being especially useful for text searching and data extraction.
from datetime import datetime, timedelta
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
from urllib.parse import urlparse, urlsplit
from contextlib import closing
#from calibre.web.feeds import Feed, feed_from_xml, feeds_from_index, templates


class DailyOffice(BasicNewsRecipe):
	title       = 'The Daily Office Readings'
	__author__  = 'Anglican Church in North America'
	description = 'ACNA Book of Common Prayer Daily Readings'
	
	remove_tags = [dict(attrs={'class':['el-switch', 'asterisk']}),
		dict(name=['script', 'noscript', 'style'])]
	
	days_number = 6
	max_articles_per_feed = days_number
	# https://api.dailyoffice2019.com/api/v1/readings/2025-12-21?absolution=lay&bible_translation=nasb&canticle_rotation=default&chrysostom=on&collects=rotating&confession=long-on-fast&ep_great_litany=ep_litany_off&family-creed=family-creed-no&family-opening-sentence=family-opening-sentence-fixed&family_collect=time_of_day&family_reading_audio=off&family_readings=brief&format=json&general_thanksgiving=on&grace=rotating&language_style=contemporary&language_style_for_our_father=traditional&lectionary=daily-office-readings&morning_prayer_invitatory=invitatory_traditional&mp_great_litany=mp_litany_off&national_holidays=us&o_antiphons=literal&psalm_style=whole_verse&psalm_translation=contemporary&psalms=contemporary&psalter=60&reading_audio=off&reading_cycle=1&reading_headings=on&reading_length=full&style=unison&suffrages=rotating&translation=nasb
	daily_office_settings = '?absolution=lay&bible_translation=nasb&canticle_rotation=default&chrysostom=on&collects=rotating&confession=long-on-fast&ep_great_litany=ep_litany_off&family-creed=family-creed-no&family-opening-sentence=family-opening-sentence-fixed&family_collect=time_of_day&family_reading_audio=off&family_readings=brief&format=json&general_thanksgiving=on&grace=rotating&language_style=contemporary&language_style_for_our_father=traditional&lectionary=daily-office-readings&morning_prayer_invitatory=invitatory_traditional&mp_great_litany=mp_litany_off&national_holidays=us&o_antiphons=literal&psalm_style=whole_verse&psalm_translation=contemporary&psalms=contemporary&psalter=60&reading_audio=off&reading_cycle=1&reading_headings=on&reading_length=full&style=unison&suffrages=rotating&translation=nasb'
	
	print('BEGIN!!!')
	
# EDIT: I'm actually not sure about this section either, now. Upon further digging it seems like this is creating multiple feeds, when really I just want to make ONE feed with a list of urls to follow. I can create that array just fine, but I cannot seem to figure out what I am supposed to pass it to so that Calibre will then process the array of links like a feed list. I am trying a bunch of different things so far and none work out.
# Override get_feeds to generate links programmatically
	def get_feeds(self):
		feeds = []
		today = datetime.now()
		print('GET FEED LIST!!!')
		
    # Generate URLs for the configured number of days
		for i in range(self.days_number):
			current_date = today + timedelta(days=i)
			date_str = current_date.strftime('%Y-%m-%d') # Format the date into the URL format required by the website
		# Full Day.
			url = 'https://api.dailyoffice2019.com/api/v1/readings/{}'.format(date_str) + self.daily_office_settings
			feed_title = current_date.strftime('Daily Prayer Readings for %B %d, %Y') # Create a unique title for each feed item
			# feed_title += ' (' + url + ')' # For Debugging.
			feeds.append((feed_title, url)) # Append the feed as a tuple: (title, url)
			# print('GETTING: ' + feed_title + ': ' + url)
			
		return feeds # The BasicNewsRecipe's parse_feeds will then process each URL in the list
		
	
# https://manual.calibre-ebook.com/_modules/calibre/web/feeds/news.html#BasicNewsRecipe.parse_feeds
	def parse_feeds(self):
		# Create a list of articles from the list of feeds returned by :meth:`BasicNewsRecipe.get_feeds`.
		# Return a list of :class:`Feed` objects.
		
		feeds = self.get_feeds()
		parsed_feeds = []
		br = self.browser
		i = 0
		for obj in feeds:
			i += 1
			print('CURRENTLY PARSING: ')
			print(i)
			if isinstance(obj, (str, bytes)):
				title, url = None, obj
			else:
				title, url = obj
			if isinstance(title, bytes):
				title = title.decode('utf-8')
			if isinstance(url, bytes):
				url = url.decode('utf-8')
			if url.startswith('feed://'):
				url = 'http'+url[4:]
			# self.report_progress(0, _('FETCHING FEED: ')+f' {title if title else url}...')
			self.report_progress(0, _('FETCHING FEED: ')+f' {title} {url}...')
			# try:
			purl = urlparse(url, allow_fragments=False)
			if purl.username or purl.password:
				hostname = purl.hostname
				if purl.port:
					hostname += f':{purl.port}'
				url = purl._replace(netloc=hostname).geturl()
				if purl.username and purl.password:
					br.add_password(url, purl.username, purl.password)
			with closing(br.open_novisit(url, timeout=self.timeout)) as f:
				raw = f.read()

			print('NEW JSON!!!')
			json_data = json.loads(raw.decode('utf-8')) # Decode and parse the JSON string
			# print(json_data)
			new_feed_content = f"<h1>Morning Prayer</h1>"
			print(new_feed_content)
			morning_prayer = json_data.get("services", {}).get("Morning Prayer", {}).get("readings", [])
			# print('MORNING PRAYER: ')
			# print(morning_prayer)
			for item in morning_prayer:
				full = item.get("full", {})
				# print(full)
				if full.get("cycle") != "30": # Skip 30 Day Cycle Items.
					# print('NAME (morning): ' + full.get('name'))
					new_feed_content += f"<h2>{full.get('name')}</h2>"
					text = full.get("text")
					text = text.replace("<html><head></head><body>", "")
					text = text.replace("</body></html>", "")
					text = text.replace("\\", "")
					# print('TEXT (morning): ' + text)
					new_feed_content += f"{text}"
			evening_prayer = json_data.get("services", {}).get("Evening Prayer", {}).get("readings", [])
			new_feed_content += f"<h1>Evening Prayer</h1>"
			for item in evening_prayer:
				full = item.get("full", {})
				if full.get("cycle") != "30": # Skip 30 Day Cycle Items.
					# print('NAME (evening): ' + full.get('name'))
					new_feed_content += f"<h2>{full.get('name')}</h2>"
					text = full.get("text")
					text = text.replace("<html><head></head><body>", "")
					text = text.replace("</body></html>", "")
					text = text.replace("\\", "")
					# print('TEXT (evening): ' + text)
					new_feed_content += f"{text}"
			print('FEED ITEM CONTENT (morn and eve): ' + new_feed_content)
			# parsed_feeds.append(new_feed_content)
# THE BELOW IS WHAT I AM MOST UNSURE ABOUT, BUT CAN'T FIND CLARITY ON WHAT TO DO DIFFERENT
			parsed_feeds.append({
				'title': 'Daily Prayer for... ',
				'url': url,
				'date': json_data.get("calendarDate", {}).get("date", {}),
				'description' : 'TEST',
				'content': new_feed_content
            })
						
				
			# except Exception as err:
				# feed = Feed()
				# msg = f'Failed feed: {title if title else url}'
				# feed.populate_from_preparsed_feed(msg, [])
				# feed.description = as_unicode(err)
				# parsed_feeds.append(feed)
				# self.log.exception(msg)
			# delay = self.get_url_specific_delay(url)
			# if delay > 0:
				# time.sleep(delay)

		# remove = [fl for fl in parsed_feeds if len(fl) == 0 and self.remove_empty_feeds]
		# for f in remove:
			# parsed_feeds.remove(f)

		return parsed_feeds

readabit · 12-20-2025, 09:49 PM

I've cracked it!

Still have some more bugs to work out (some dates are not returning anything for some reason), but I am actually getting content now!

Code:

import json
import string, re # Brings in two powerful built-in modules: string for common string constants (like all lowercase/uppercase letters, digits, punctuation) and re (regular expressions) for advanced pattern matching and manipulation, letting you find, split, or replace text based on complex rules, with re being especially useful for text searching and data extraction.
from datetime import datetime, timedelta
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
from urllib.parse import urlparse, urlsplit
from contextlib import closing
#from calibre.web.feeds import Feed, feed_from_xml, feeds_from_index, templates
from calibre.web.feeds import Article, Feed


class DailyOffice(BasicNewsRecipe):
	title       = 'The Daily Office Readings'
	__author__  = 'Anglican Church in North America'
	description = 'ACNA Book of Common Prayer Daily Readings'
	#timefmt = ' [%a, %d %b, %Y]'
	
	remove_tags = [dict(attrs={'class':['el-switch', 'asterisk']}),
		dict(name=['script', 'noscript', 'style'])]
	# no_stylesheets = True
	#extra_css = 'h1 {font: sans-serif large;}\n.byline {font:monospace;}'
	
	# auto_cleanup   = True
	# auto_cleanup_keep = '//*[@class="readingsPanel"]' # This is the key line to keep only content inside a specific class
	
	days_number = 6
	# max_articles_per_feed = days_number * 2
	max_articles_per_feed = days_number
	# https://api.dailyoffice2019.com/api/v1/readings/2025-12-21?absolution=lay&bible_translation=nasb&canticle_rotation=default&chrysostom=on&collects=rotating&confession=long-on-fast&ep_great_litany=ep_litany_off&family-creed=family-creed-no&family-opening-sentence=family-opening-sentence-fixed&family_collect=time_of_day&family_reading_audio=off&family_readings=brief&format=json&general_thanksgiving=on&grace=rotating&language_style=contemporary&language_style_for_our_father=traditional&lectionary=daily-office-readings&morning_prayer_invitatory=invitatory_traditional&mp_great_litany=mp_litany_off&national_holidays=us&o_antiphons=literal&psalm_style=whole_verse&psalm_translation=contemporary&psalms=contemporary&psalter=60&reading_audio=off&reading_cycle=1&reading_headings=on&reading_length=full&style=unison&suffrages=rotating&translation=nasb
	daily_office_settings = '?absolution=lay&bible_translation=nasb&canticle_rotation=default&chrysostom=on&collects=rotating&confession=long-on-fast&ep_great_litany=ep_litany_off&family-creed=family-creed-no&family-opening-sentence=family-opening-sentence-fixed&family_collect=time_of_day&family_reading_audio=off&family_readings=brief&format=json&general_thanksgiving=on&grace=rotating&language_style=contemporary&language_style_for_our_father=traditional&lectionary=daily-office-readings&morning_prayer_invitatory=invitatory_traditional&mp_great_litany=mp_litany_off&national_holidays=us&o_antiphons=literal&psalm_style=whole_verse&psalm_translation=contemporary&psalms=contemporary&psalter=60&reading_audio=off&reading_cycle=1&reading_headings=on&reading_length=full&style=unison&suffrages=rotating&translation=nasb'
	
	print('BEGIN!!!')
	
	
	my_articles = []
	today = datetime.now()
	print('CREATE ARTICLE LIST!!!')
	
# Generate URLs for the configured number of days
	for i in range(days_number):
		print('ARTICLE LIST #')
		print(i)
		current_date = today + timedelta(days=i)
		date_str = current_date.strftime('%Y-%m-%d') # Format the date into the URL format required by the website
	# Full Day.
		url = 'https://api.dailyoffice2019.com/api/v1/readings/{}'.format(date_str) + daily_office_settings
		article_title = current_date.strftime('Daily Prayer Readings for %B %d, %Y') # Create a unique title for each feed item
		# article_title += ' (' + url + ')' # For Debugging.
		#my_articles.append((article_title, url)) # Append the feed as a tuple: (title, url)
		my_articles.append({
			'title'       : article_title,
			'url'         : url
			#'date'        : format(date_str),
			#'description' : 'Daily Prayer',
			#'content'     : ''
        })
		# print('GETTING: ' + article_title + ': ' + url)
		
	print('ARTICLE LIST COMPLETED')
	print(my_articles)


	def parse_index(self):
		#print(self.title)
		#print(self.my_articles)
		feeds = []
		feeds.append((self.title, self.my_articles))
		return feeds

	
	def preprocess_raw_html(self, raw_html, url):
    # The 'soup' object initially holds the raw downloaded content
		#json_data = json.loads(soup.encode('utf-8')) # Decode and parse the JSON string
		print('BEGIN PROCESSING!!!')
		json_data = json.loads(raw_html) # Decode and parse the JSON string
				
    # Process the JSON data and build HTML
		new_html_content = "<html><body>"
		
		morning_prayer = json_data.get("services", {}).get("Morning Prayer", {}).get("readings", [])
		print('MORNING PRAYER: ')
		print(morning_prayer)
		new_html_content = f"<h1>Morning Prayer</h1>"
		print('Beging Morning Prayer...')
		for item in morning_prayer:
			#new_html_content += f"<h2>{item['title']}</h2>"
			full = item.get("full", {})
			if full.get("cycle") != "30": # Skip 30 Day Cycle Items.
				print('NAME (morning): ' + full.get('name'))
				new_html_content += f"<h2>{full.get('name')}</h2>"
				text = full.get("text")
				text = text.replace("<html><head></head><body>", "")
				text = text.replace("</body></html>", "")
				#html.unescape(element)
				text = text.replace("\\", "")
				print('TEXT (morning): ' + text)
				new_html_content += f"{text}"
		
		evening_prayer = json_data.get("services", {}).get("Evening Prayer", {}).get("readings", [])
		new_html_content += f"<h1>Evening Prayer</h1>"
		for item in evening_prayer:
			full = item.get("full", {})
			if full.get("cycle") != "30": # Skip 30 Day Cycle Items.
				print('NAME (evening): ' + full.get('name'))
				new_html_content += f"<h2>{full.get('name')}</h2>"
				text = text = full.get("text")
				text = text.replace("<html><head></head><body>", "")
				text = text.replace("</body></html>", "")
				#html.unescape(element)
				text = text.replace("\\", "")
				print('TEXT (evening): ' + text)
				new_html_content += f"{text}"
			
		new_html_content += "</body></html>"
		print('FEED ITEM CONTENT (morn and eve): ' + new_html_content)
		#return BeautifulSoup(new_html_content, 'html.parser') # Return a new BeautifulSoup object with the HTML content
		#return self.index_to_soup(new_html_content)
		return new_html_content

dunhill · 12-21-2025, 10:01 PM

Let's see if this is what you're looking for

readabit · 12-24-2025, 05:19 PM

Quote:

Originally Posted by dunhill

Let's see if this is what you're looking for

Thanks much for sharing this code! You made some great improvements on my cobbled together mess!

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Load json which got from request in recipe	njpig	Recipes	2	05-17-2024 05:26 AM
Calibre Content Server Json API	mathewparet	Development	1	01-27-2023 10:15 AM
Upcoming nonprofit requesting for a recipe	amitbatra	Recipes	5	02-01-2016 04:16 PM
Requesting a different page in a recipe	ireadtheinternet	Recipes	3	10-31-2014 06:37 PM
content retrieval recipe	Torx	Amazon Kindle	0	12-17-2010 12:05 PM

Advert