Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 09-03-2022, 03:44 AM   #1
michaelbr
Connoisseur
michaelbr began at the beginning.
 
michaelbr's Avatar
 
Posts: 77
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
extract first line of html/text file pointed by TOC from epub

I have a script that can extract TOC from epub file, is there a way for me to retrieve first line or x chars from the html file pointed by TOC. My understanding is that each TOC entry point to a certain html file, so I'd like to open that html file and retried the 1st line or x chars from the 1st paragraph. I'm able to get the TOC using the following scripts (thanks to cas)
Code:
#! /bin/bash

# This script needs InfoZIP's unzip program
# and the xml2 tool from http://ofb.net/~egnor/xml2/
# and sed, of course.

EPUB_LIST=(my*.epub)

for f in "${EPUB_LIST[@]}"; do
    echo "$f:"
    unzip -p "$f" OEBPS/toc.ncx | 
        xml2 | 
        sed -n -e 's:^/ncx/navMap/navPoint/navLabel/text=:  :p'
    echo
done
michaelbr is offline   Reply With Quote
Old 09-03-2022, 09:31 AM   #2
jhowell
Grand Sorcerer
jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.
 
jhowell's Avatar
 
Posts: 6,497
Karma: 84420419
Join Date: Nov 2011
Location: Tampa Bay, Florida
Device: Kindles
Are you doing this for a specific EPUB or are you trying to make something more general purpose? In general the TOC file doesn’t always have a specific name, or might not even be present in EPUB 3. And TOC entries don’t always point to the beginning of an XHTML file.

You may want to consider implementing this as a plugin of an EPUB editor such as calibre or sigil.
jhowell is offline   Reply With Quote
Advert
Reply

Tags
epub, html ebook, shell script


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
HTML to Epub conversion isn't generating TOC through command line marcosalles Conversion 1 02-20-2014 10:09 PM
Online HTML book -> epub: TOC from single file dancal Conversion 0 01-27-2014 01:45 PM
Extract TOC from EPUB vishnu.kumar Conversion 11 08-08-2012 06:54 AM
Kindler previewer not recognizing toc.ncx file, my html toc, or the start point... petercrowell Kindle Formats 2 05-01-2012 08:14 AM
HTML input plugin stripping text within toc tags in child html file nimblebooks Conversion 3 02-21-2012 03:24 PM


All times are GMT -4. The time now is 05:18 AM.


MobileRead.com is a privately owned, operated and funded community.