View Single Post
Old 12-20-2025, 05:37 PM   #1
readabit
Enthusiast
readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.
 
Posts: 44
Karma: 3034
Join Date: Mar 2012
Device: Boox Note Air 2 Plus, Samsung Galaxy S23 (base), Samsung Galaxy Tab S3
Question Requesting Assistance with Recipe for JSON Content

Greetings! I would be very grateful for some assistance with this recipe. I am trying to create a recipe that will get the next N number of days from a Bible reading website (so I'm not concerned with historical data, only future data). The end goal is to be able to have a week or so of readings always available offline on my eink device so I can read when away from wifi.

The site in question (morning readings example) uses Javascript to load the content, so I had to find the correct API call (via watching the sent headers), which gives me this API JSON result.

Since there is no RSS feed for this site I have to create the feed list by iterating through dates. Through a mix of pouring over the Caliber examples and some Google AI assistance I'm partway there, but the below code is wrong in some way that is beyond me. I think parse_feeds is the function to use, and I am successfully getting at the data I want, but I'm running into errors that are outside of (but most certainly caused by) my code. My best guess is that I'm not passing data the right way and/or I should be using a different function.

Within the JSON tree (of which there is one tree per day) I am only interested in entries for Services->Morning Prayer->full and Services->Evening Prayer->full. You'll also note in my code that I'm excluding items with a cycle value of 30.

I also may end up needing some additional cleanup of the resulting html, but at the moment I'm just focused on getting the code working without errors.

Any help is greatly appreciated!

Code:
import json
import string, re # Brings in two powerful built-in modules: string for common string constants (like all lowercase/uppercase letters, digits, punctuation) and re (regular expressions) for advanced pattern matching and manipulation, letting you find, split, or replace text based on complex rules, with re being especially useful for text searching and data extraction.
from datetime import datetime, timedelta
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
from urllib.parse import urlparse, urlsplit
from contextlib import closing
#from calibre.web.feeds import Feed, feed_from_xml, feeds_from_index, templates


class DailyOffice(BasicNewsRecipe):
	title       = 'The Daily Office Readings'
	__author__  = 'Anglican Church in North America'
	description = 'ACNA Book of Common Prayer Daily Readings'
	
	remove_tags = [dict(attrs={'class':['el-switch', 'asterisk']}),
		dict(name=['script', 'noscript', 'style'])]
	
	days_number = 6
	max_articles_per_feed = days_number
	# https://api.dailyoffice2019.com/api/v1/readings/2025-12-21?absolution=lay&bible_translation=nasb&canticle_rotation=default&chrysostom=on&collects=rotating&confession=long-on-fast&ep_great_litany=ep_litany_off&family-creed=family-creed-no&family-opening-sentence=family-opening-sentence-fixed&family_collect=time_of_day&family_reading_audio=off&family_readings=brief&format=json&general_thanksgiving=on&grace=rotating&language_style=contemporary&language_style_for_our_father=traditional&lectionary=daily-office-readings&morning_prayer_invitatory=invitatory_traditional&mp_great_litany=mp_litany_off&national_holidays=us&o_antiphons=literal&psalm_style=whole_verse&psalm_translation=contemporary&psalms=contemporary&psalter=60&reading_audio=off&reading_cycle=1&reading_headings=on&reading_length=full&style=unison&suffrages=rotating&translation=nasb
	daily_office_settings = '?absolution=lay&bible_translation=nasb&canticle_rotation=default&chrysostom=on&collects=rotating&confession=long-on-fast&ep_great_litany=ep_litany_off&family-creed=family-creed-no&family-opening-sentence=family-opening-sentence-fixed&family_collect=time_of_day&family_reading_audio=off&family_readings=brief&format=json&general_thanksgiving=on&grace=rotating&language_style=contemporary&language_style_for_our_father=traditional&lectionary=daily-office-readings&morning_prayer_invitatory=invitatory_traditional&mp_great_litany=mp_litany_off&national_holidays=us&o_antiphons=literal&psalm_style=whole_verse&psalm_translation=contemporary&psalms=contemporary&psalter=60&reading_audio=off&reading_cycle=1&reading_headings=on&reading_length=full&style=unison&suffrages=rotating&translation=nasb'
	
	print('BEGIN!!!')
	
# EDIT: I'm actually not sure about this section either, now. Upon further digging it seems like this is creating multiple feeds, when really I just want to make ONE feed with a list of urls to follow. I can create that array just fine, but I cannot seem to figure out what I am supposed to pass it to so that Calibre will then process the array of links like a feed list. I am trying a bunch of different things so far and none work out.
# Override get_feeds to generate links programmatically
	def get_feeds(self):
		feeds = []
		today = datetime.now()
		print('GET FEED LIST!!!')
		
    # Generate URLs for the configured number of days
		for i in range(self.days_number):
			current_date = today + timedelta(days=i)
			date_str = current_date.strftime('%Y-%m-%d') # Format the date into the URL format required by the website
		# Full Day.
			url = 'https://api.dailyoffice2019.com/api/v1/readings/{}'.format(date_str) + self.daily_office_settings
			feed_title = current_date.strftime('Daily Prayer Readings for %B %d, %Y') # Create a unique title for each feed item
			# feed_title += ' (' + url + ')' # For Debugging.
			feeds.append((feed_title, url)) # Append the feed as a tuple: (title, url)
			# print('GETTING: ' + feed_title + ': ' + url)
			
		return feeds # The BasicNewsRecipe's parse_feeds will then process each URL in the list
		
	
# https://manual.calibre-ebook.com/_modules/calibre/web/feeds/news.html#BasicNewsRecipe.parse_feeds
	def parse_feeds(self):
		# Create a list of articles from the list of feeds returned by :meth:`BasicNewsRecipe.get_feeds`.
		# Return a list of :class:`Feed` objects.
		
		feeds = self.get_feeds()
		parsed_feeds = []
		br = self.browser
		i = 0
		for obj in feeds:
			i += 1
			print('CURRENTLY PARSING: ')
			print(i)
			if isinstance(obj, (str, bytes)):
				title, url = None, obj
			else:
				title, url = obj
			if isinstance(title, bytes):
				title = title.decode('utf-8')
			if isinstance(url, bytes):
				url = url.decode('utf-8')
			if url.startswith('feed://'):
				url = 'http'+url[4:]
			# self.report_progress(0, _('FETCHING FEED: ')+f' {title if title else url}...')
			self.report_progress(0, _('FETCHING FEED: ')+f' {title} {url}...')
			# try:
			purl = urlparse(url, allow_fragments=False)
			if purl.username or purl.password:
				hostname = purl.hostname
				if purl.port:
					hostname += f':{purl.port}'
				url = purl._replace(netloc=hostname).geturl()
				if purl.username and purl.password:
					br.add_password(url, purl.username, purl.password)
			with closing(br.open_novisit(url, timeout=self.timeout)) as f:
				raw = f.read()

			print('NEW JSON!!!')
			json_data = json.loads(raw.decode('utf-8')) # Decode and parse the JSON string
			# print(json_data)
			new_feed_content = f"<h1>Morning Prayer</h1>"
			print(new_feed_content)
			morning_prayer = json_data.get("services", {}).get("Morning Prayer", {}).get("readings", [])
			# print('MORNING PRAYER: ')
			# print(morning_prayer)
			for item in morning_prayer:
				full = item.get("full", {})
				# print(full)
				if full.get("cycle") != "30": # Skip 30 Day Cycle Items.
					# print('NAME (morning): ' + full.get('name'))
					new_feed_content += f"<h2>{full.get('name')}</h2>"
					text = full.get("text")
					text = text.replace("<html><head></head><body>", "")
					text = text.replace("</body></html>", "")
					text = text.replace("\\", "")
					# print('TEXT (morning): ' + text)
					new_feed_content += f"{text}"
			evening_prayer = json_data.get("services", {}).get("Evening Prayer", {}).get("readings", [])
			new_feed_content += f"<h1>Evening Prayer</h1>"
			for item in evening_prayer:
				full = item.get("full", {})
				if full.get("cycle") != "30": # Skip 30 Day Cycle Items.
					# print('NAME (evening): ' + full.get('name'))
					new_feed_content += f"<h2>{full.get('name')}</h2>"
					text = full.get("text")
					text = text.replace("<html><head></head><body>", "")
					text = text.replace("</body></html>", "")
					text = text.replace("\\", "")
					# print('TEXT (evening): ' + text)
					new_feed_content += f"{text}"
			print('FEED ITEM CONTENT (morn and eve): ' + new_feed_content)
			# parsed_feeds.append(new_feed_content)
# THE BELOW IS WHAT I AM MOST UNSURE ABOUT, BUT CAN'T FIND CLARITY ON WHAT TO DO DIFFERENT
			parsed_feeds.append({
				'title': 'Daily Prayer for... ',
				'url': url,
				'date': json_data.get("calendarDate", {}).get("date", {}),
				'description' : 'TEST',
				'content': new_feed_content
            })
						
				
			# except Exception as err:
				# feed = Feed()
				# msg = f'Failed feed: {title if title else url}'
				# feed.populate_from_preparsed_feed(msg, [])
				# feed.description = as_unicode(err)
				# parsed_feeds.append(feed)
				# self.log.exception(msg)
			# delay = self.get_url_specific_delay(url)
			# if delay > 0:
				# time.sleep(delay)

		# remove = [fl for fl in parsed_feeds if len(fl) == 0 and self.remove_empty_feeds]
		# for f in remove:
			# parsed_feeds.remove(f)

		return parsed_feeds

Last edited by readabit; 12-20-2025 at 09:32 PM.
readabit is offline   Reply With Quote