Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 12-20-2025, 05:37 PM   #1
readabit
Enthusiast
readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.
 
Posts: 44
Karma: 3034
Join Date: Mar 2012
Device: Boox Note Air 2 Plus, Samsung Galaxy S23 (base), Samsung Galaxy Tab S3
Question Requesting Assistance with Recipe for JSON Content

Greetings! I would be very grateful for some assistance with this recipe. I am trying to create a recipe that will get the next N number of days from a Bible reading website (so I'm not concerned with historical data, only future data). The end goal is to be able to have a week or so of readings always available offline on my eink device so I can read when away from wifi.

The site in question (morning readings example) uses Javascript to load the content, so I had to find the correct API call (via watching the sent headers), which gives me this API JSON result.

Since there is no RSS feed for this site I have to create the feed list by iterating through dates. Through a mix of pouring over the Caliber examples and some Google AI assistance I'm partway there, but the below code is wrong in some way that is beyond me. I think parse_feeds is the function to use, and I am successfully getting at the data I want, but I'm running into errors that are outside of (but most certainly caused by) my code. My best guess is that I'm not passing data the right way and/or I should be using a different function.

Within the JSON tree (of which there is one tree per day) I am only interested in entries for Services->Morning Prayer->full and Services->Evening Prayer->full. You'll also note in my code that I'm excluding items with a cycle value of 30.

I also may end up needing some additional cleanup of the resulting html, but at the moment I'm just focused on getting the code working without errors.

Any help is greatly appreciated!

Code:
import json
import string, re # Brings in two powerful built-in modules: string for common string constants (like all lowercase/uppercase letters, digits, punctuation) and re (regular expressions) for advanced pattern matching and manipulation, letting you find, split, or replace text based on complex rules, with re being especially useful for text searching and data extraction.
from datetime import datetime, timedelta
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
from urllib.parse import urlparse, urlsplit
from contextlib import closing
#from calibre.web.feeds import Feed, feed_from_xml, feeds_from_index, templates


class DailyOffice(BasicNewsRecipe):
	title       = 'The Daily Office Readings'
	__author__  = 'Anglican Church in North America'
	description = 'ACNA Book of Common Prayer Daily Readings'
	
	remove_tags = [dict(attrs={'class':['el-switch', 'asterisk']}),
		dict(name=['script', 'noscript', 'style'])]
	
	days_number = 6
	max_articles_per_feed = days_number
	# https://api.dailyoffice2019.com/api/v1/readings/2025-12-21?absolution=lay&bible_translation=nasb&canticle_rotation=default&chrysostom=on&collects=rotating&confession=long-on-fast&ep_great_litany=ep_litany_off&family-creed=family-creed-no&family-opening-sentence=family-opening-sentence-fixed&family_collect=time_of_day&family_reading_audio=off&family_readings=brief&format=json&general_thanksgiving=on&grace=rotating&language_style=contemporary&language_style_for_our_father=traditional&lectionary=daily-office-readings&morning_prayer_invitatory=invitatory_traditional&mp_great_litany=mp_litany_off&national_holidays=us&o_antiphons=literal&psalm_style=whole_verse&psalm_translation=contemporary&psalms=contemporary&psalter=60&reading_audio=off&reading_cycle=1&reading_headings=on&reading_length=full&style=unison&suffrages=rotating&translation=nasb
	daily_office_settings = '?absolution=lay&bible_translation=nasb&canticle_rotation=default&chrysostom=on&collects=rotating&confession=long-on-fast&ep_great_litany=ep_litany_off&family-creed=family-creed-no&family-opening-sentence=family-opening-sentence-fixed&family_collect=time_of_day&family_reading_audio=off&family_readings=brief&format=json&general_thanksgiving=on&grace=rotating&language_style=contemporary&language_style_for_our_father=traditional&lectionary=daily-office-readings&morning_prayer_invitatory=invitatory_traditional&mp_great_litany=mp_litany_off&national_holidays=us&o_antiphons=literal&psalm_style=whole_verse&psalm_translation=contemporary&psalms=contemporary&psalter=60&reading_audio=off&reading_cycle=1&reading_headings=on&reading_length=full&style=unison&suffrages=rotating&translation=nasb'
	
	print('BEGIN!!!')
	
# EDIT: I'm actually not sure about this section either, now. Upon further digging it seems like this is creating multiple feeds, when really I just want to make ONE feed with a list of urls to follow. I can create that array just fine, but I cannot seem to figure out what I am supposed to pass it to so that Calibre will then process the array of links like a feed list. I am trying a bunch of different things so far and none work out.
# Override get_feeds to generate links programmatically
	def get_feeds(self):
		feeds = []
		today = datetime.now()
		print('GET FEED LIST!!!')
		
    # Generate URLs for the configured number of days
		for i in range(self.days_number):
			current_date = today + timedelta(days=i)
			date_str = current_date.strftime('%Y-%m-%d') # Format the date into the URL format required by the website
		# Full Day.
			url = 'https://api.dailyoffice2019.com/api/v1/readings/{}'.format(date_str) + self.daily_office_settings
			feed_title = current_date.strftime('Daily Prayer Readings for %B %d, %Y') # Create a unique title for each feed item
			# feed_title += ' (' + url + ')' # For Debugging.
			feeds.append((feed_title, url)) # Append the feed as a tuple: (title, url)
			# print('GETTING: ' + feed_title + ': ' + url)
			
		return feeds # The BasicNewsRecipe's parse_feeds will then process each URL in the list
		
	
# https://manual.calibre-ebook.com/_modules/calibre/web/feeds/news.html#BasicNewsRecipe.parse_feeds
	def parse_feeds(self):
		# Create a list of articles from the list of feeds returned by :meth:`BasicNewsRecipe.get_feeds`.
		# Return a list of :class:`Feed` objects.
		
		feeds = self.get_feeds()
		parsed_feeds = []
		br = self.browser
		i = 0
		for obj in feeds:
			i += 1
			print('CURRENTLY PARSING: ')
			print(i)
			if isinstance(obj, (str, bytes)):
				title, url = None, obj
			else:
				title, url = obj
			if isinstance(title, bytes):
				title = title.decode('utf-8')
			if isinstance(url, bytes):
				url = url.decode('utf-8')
			if url.startswith('feed://'):
				url = 'http'+url[4:]
			# self.report_progress(0, _('FETCHING FEED: ')+f' {title if title else url}...')
			self.report_progress(0, _('FETCHING FEED: ')+f' {title} {url}...')
			# try:
			purl = urlparse(url, allow_fragments=False)
			if purl.username or purl.password:
				hostname = purl.hostname
				if purl.port:
					hostname += f':{purl.port}'
				url = purl._replace(netloc=hostname).geturl()
				if purl.username and purl.password:
					br.add_password(url, purl.username, purl.password)
			with closing(br.open_novisit(url, timeout=self.timeout)) as f:
				raw = f.read()

			print('NEW JSON!!!')
			json_data = json.loads(raw.decode('utf-8')) # Decode and parse the JSON string
			# print(json_data)
			new_feed_content = f"<h1>Morning Prayer</h1>"
			print(new_feed_content)
			morning_prayer = json_data.get("services", {}).get("Morning Prayer", {}).get("readings", [])
			# print('MORNING PRAYER: ')
			# print(morning_prayer)
			for item in morning_prayer:
				full = item.get("full", {})
				# print(full)
				if full.get("cycle") != "30": # Skip 30 Day Cycle Items.
					# print('NAME (morning): ' + full.get('name'))
					new_feed_content += f"<h2>{full.get('name')}</h2>"
					text = full.get("text")
					text = text.replace("<html><head></head><body>", "")
					text = text.replace("</body></html>", "")
					text = text.replace("\\", "")
					# print('TEXT (morning): ' + text)
					new_feed_content += f"{text}"
			evening_prayer = json_data.get("services", {}).get("Evening Prayer", {}).get("readings", [])
			new_feed_content += f"<h1>Evening Prayer</h1>"
			for item in evening_prayer:
				full = item.get("full", {})
				if full.get("cycle") != "30": # Skip 30 Day Cycle Items.
					# print('NAME (evening): ' + full.get('name'))
					new_feed_content += f"<h2>{full.get('name')}</h2>"
					text = full.get("text")
					text = text.replace("<html><head></head><body>", "")
					text = text.replace("</body></html>", "")
					text = text.replace("\\", "")
					# print('TEXT (evening): ' + text)
					new_feed_content += f"{text}"
			print('FEED ITEM CONTENT (morn and eve): ' + new_feed_content)
			# parsed_feeds.append(new_feed_content)
# THE BELOW IS WHAT I AM MOST UNSURE ABOUT, BUT CAN'T FIND CLARITY ON WHAT TO DO DIFFERENT
			parsed_feeds.append({
				'title': 'Daily Prayer for... ',
				'url': url,
				'date': json_data.get("calendarDate", {}).get("date", {}),
				'description' : 'TEST',
				'content': new_feed_content
            })
						
				
			# except Exception as err:
				# feed = Feed()
				# msg = f'Failed feed: {title if title else url}'
				# feed.populate_from_preparsed_feed(msg, [])
				# feed.description = as_unicode(err)
				# parsed_feeds.append(feed)
				# self.log.exception(msg)
			# delay = self.get_url_specific_delay(url)
			# if delay > 0:
				# time.sleep(delay)

		# remove = [fl for fl in parsed_feeds if len(fl) == 0 and self.remove_empty_feeds]
		# for f in remove:
			# parsed_feeds.remove(f)

		return parsed_feeds

Last edited by readabit; 12-20-2025 at 09:32 PM.
readabit is offline   Reply With Quote
Old 12-20-2025, 09:49 PM   #2
readabit
Enthusiast
readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.readabit could sell banana peel slippers to a Deveel.
 
Posts: 44
Karma: 3034
Join Date: Mar 2012
Device: Boox Note Air 2 Plus, Samsung Galaxy S23 (base), Samsung Galaxy Tab S3
Lightbulb

I've cracked it!

Still have some more bugs to work out (some dates are not returning anything for some reason), but I am actually getting content now!

Code:
import json
import string, re # Brings in two powerful built-in modules: string for common string constants (like all lowercase/uppercase letters, digits, punctuation) and re (regular expressions) for advanced pattern matching and manipulation, letting you find, split, or replace text based on complex rules, with re being especially useful for text searching and data extraction.
from datetime import datetime, timedelta
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
from urllib.parse import urlparse, urlsplit
from contextlib import closing
#from calibre.web.feeds import Feed, feed_from_xml, feeds_from_index, templates
from calibre.web.feeds import Article, Feed


class DailyOffice(BasicNewsRecipe):
	title       = 'The Daily Office Readings'
	__author__  = 'Anglican Church in North America'
	description = 'ACNA Book of Common Prayer Daily Readings'
	#timefmt = ' [%a, %d %b, %Y]'
	
	remove_tags = [dict(attrs={'class':['el-switch', 'asterisk']}),
		dict(name=['script', 'noscript', 'style'])]
	# no_stylesheets = True
	#extra_css = 'h1 {font: sans-serif large;}\n.byline {font:monospace;}'
	
	# auto_cleanup   = True
	# auto_cleanup_keep = '//*[@class="readingsPanel"]' # This is the key line to keep only content inside a specific class
	
	days_number = 6
	# max_articles_per_feed = days_number * 2
	max_articles_per_feed = days_number
	# https://api.dailyoffice2019.com/api/v1/readings/2025-12-21?absolution=lay&bible_translation=nasb&canticle_rotation=default&chrysostom=on&collects=rotating&confession=long-on-fast&ep_great_litany=ep_litany_off&family-creed=family-creed-no&family-opening-sentence=family-opening-sentence-fixed&family_collect=time_of_day&family_reading_audio=off&family_readings=brief&format=json&general_thanksgiving=on&grace=rotating&language_style=contemporary&language_style_for_our_father=traditional&lectionary=daily-office-readings&morning_prayer_invitatory=invitatory_traditional&mp_great_litany=mp_litany_off&national_holidays=us&o_antiphons=literal&psalm_style=whole_verse&psalm_translation=contemporary&psalms=contemporary&psalter=60&reading_audio=off&reading_cycle=1&reading_headings=on&reading_length=full&style=unison&suffrages=rotating&translation=nasb
	daily_office_settings = '?absolution=lay&bible_translation=nasb&canticle_rotation=default&chrysostom=on&collects=rotating&confession=long-on-fast&ep_great_litany=ep_litany_off&family-creed=family-creed-no&family-opening-sentence=family-opening-sentence-fixed&family_collect=time_of_day&family_reading_audio=off&family_readings=brief&format=json&general_thanksgiving=on&grace=rotating&language_style=contemporary&language_style_for_our_father=traditional&lectionary=daily-office-readings&morning_prayer_invitatory=invitatory_traditional&mp_great_litany=mp_litany_off&national_holidays=us&o_antiphons=literal&psalm_style=whole_verse&psalm_translation=contemporary&psalms=contemporary&psalter=60&reading_audio=off&reading_cycle=1&reading_headings=on&reading_length=full&style=unison&suffrages=rotating&translation=nasb'
	
	print('BEGIN!!!')
	
	
	my_articles = []
	today = datetime.now()
	print('CREATE ARTICLE LIST!!!')
	
# Generate URLs for the configured number of days
	for i in range(days_number):
		print('ARTICLE LIST #')
		print(i)
		current_date = today + timedelta(days=i)
		date_str = current_date.strftime('%Y-%m-%d') # Format the date into the URL format required by the website
	# Full Day.
		url = 'https://api.dailyoffice2019.com/api/v1/readings/{}'.format(date_str) + daily_office_settings
		article_title = current_date.strftime('Daily Prayer Readings for %B %d, %Y') # Create a unique title for each feed item
		# article_title += ' (' + url + ')' # For Debugging.
		#my_articles.append((article_title, url)) # Append the feed as a tuple: (title, url)
		my_articles.append({
			'title'       : article_title,
			'url'         : url
			#'date'        : format(date_str),
			#'description' : 'Daily Prayer',
			#'content'     : ''
        })
		# print('GETTING: ' + article_title + ': ' + url)
		
	print('ARTICLE LIST COMPLETED')
	print(my_articles)


	def parse_index(self):
		#print(self.title)
		#print(self.my_articles)
		feeds = []
		feeds.append((self.title, self.my_articles))
		return feeds

	
	def preprocess_raw_html(self, raw_html, url):
    # The 'soup' object initially holds the raw downloaded content
		#json_data = json.loads(soup.encode('utf-8')) # Decode and parse the JSON string
		print('BEGIN PROCESSING!!!')
		json_data = json.loads(raw_html) # Decode and parse the JSON string
				
    # Process the JSON data and build HTML
		new_html_content = "<html><body>"
		
		morning_prayer = json_data.get("services", {}).get("Morning Prayer", {}).get("readings", [])
		print('MORNING PRAYER: ')
		print(morning_prayer)
		new_html_content = f"<h1>Morning Prayer</h1>"
		print('Beging Morning Prayer...')
		for item in morning_prayer:
			#new_html_content += f"<h2>{item['title']}</h2>"
			full = item.get("full", {})
			if full.get("cycle") != "30": # Skip 30 Day Cycle Items.
				print('NAME (morning): ' + full.get('name'))
				new_html_content += f"<h2>{full.get('name')}</h2>"
				text = full.get("text")
				text = text.replace("<html><head></head><body>", "")
				text = text.replace("</body></html>", "")
				#html.unescape(element)
				text = text.replace("\\", "")
				print('TEXT (morning): ' + text)
				new_html_content += f"{text}"
		
		evening_prayer = json_data.get("services", {}).get("Evening Prayer", {}).get("readings", [])
		new_html_content += f"<h1>Evening Prayer</h1>"
		for item in evening_prayer:
			full = item.get("full", {})
			if full.get("cycle") != "30": # Skip 30 Day Cycle Items.
				print('NAME (evening): ' + full.get('name'))
				new_html_content += f"<h2>{full.get('name')}</h2>"
				text = text = full.get("text")
				text = text.replace("<html><head></head><body>", "")
				text = text.replace("</body></html>", "")
				#html.unescape(element)
				text = text.replace("\\", "")
				print('TEXT (evening): ' + text)
				new_html_content += f"{text}"
			
		new_html_content += "</body></html>"
		print('FEED ITEM CONTENT (morn and eve): ' + new_html_content)
		#return BeautifulSoup(new_html_content, 'html.parser') # Return a new BeautifulSoup object with the HTML content
		#return self.index_to_soup(new_html_content)
		return new_html_content
readabit is offline   Reply With Quote
Advert
Old 12-21-2025, 10:01 PM   #3
dunhill
Guru
dunhill ought to be getting tired of karma fortunes by now.dunhill ought to be getting tired of karma fortunes by now.dunhill ought to be getting tired of karma fortunes by now.dunhill ought to be getting tired of karma fortunes by now.dunhill ought to be getting tired of karma fortunes by now.dunhill ought to be getting tired of karma fortunes by now.dunhill ought to be getting tired of karma fortunes by now.dunhill ought to be getting tired of karma fortunes by now.dunhill ought to be getting tired of karma fortunes by now.dunhill ought to be getting tired of karma fortunes by now.dunhill ought to be getting tired of karma fortunes by now.
 
dunhill's Avatar
 
Posts: 937
Karma: 1004662
Join Date: Sep 2017
Location: Buenos Aires, Argentina
Device: moon+ reader, kindle paperwhite
Let's see if this is what you're looking for
Attached Files
File Type: recipe daily_office.recipe (3.9 KB, 6 views)
File Type: recipe daily_office_single.recipe (3.5 KB, 4 views)
File Type: recipe daily_office_multi.recipe (3.5 KB, 5 views)

Last edited by dunhill; 12-21-2025 at 10:28 PM.
dunhill is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Load json which got from request in recipe njpig Recipes 2 05-17-2024 05:26 AM
Calibre Content Server Json API mathewparet Development 1 01-27-2023 10:15 AM
Upcoming nonprofit requesting for a recipe amitbatra Recipes 5 02-01-2016 04:16 PM
Requesting a different page in a recipe ireadtheinternet Recipes 3 10-31-2014 06:37 PM
content retrieval recipe Torx Amazon Kindle 0 12-17-2010 12:05 PM


All times are GMT -4. The time now is 12:35 PM.


MobileRead.com is a privately owned, operated and funded community.