View Single Post
Old 09-03-2022, 03:44 AM   #1
michaelbr
Connoisseur
michaelbr began at the beginning.
 
michaelbr's Avatar
 
Posts: 81
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
extract first line of html/text file pointed by TOC from epub

I have a script that can extract TOC from epub file, is there a way for me to retrieve first line or x chars from the html file pointed by TOC. My understanding is that each TOC entry point to a certain html file, so I'd like to open that html file and retried the 1st line or x chars from the 1st paragraph. I'm able to get the TOC using the following scripts (thanks to cas)
Code:
#! /bin/bash

# This script needs InfoZIP's unzip program
# and the xml2 tool from http://ofb.net/~egnor/xml2/
# and sed, of course.

EPUB_LIST=(my*.epub)

for f in "${EPUB_LIST[@]}"; do
    echo "$f:"
    unzip -p "$f" OEBPS/toc.ncx | 
        xml2 | 
        sed -n -e 's:^/ncx/navMap/navPoint/navLabel/text=:  :p'
    echo
done
michaelbr is offline   Reply With Quote