![]() |
#1 |
Connoisseur
![]() Posts: 81
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
|
extract first line of html/text file pointed by TOC from epub
I have a script that can extract TOC from epub file, is there a way for me to retrieve first line or x chars from the html file pointed by TOC. My understanding is that each TOC entry point to a certain html file, so I'd like to open that html file and retried the 1st line or x chars from the 1st paragraph. I'm able to get the TOC using the following scripts (thanks to cas)
Code:
#! /bin/bash # This script needs InfoZIP's unzip program # and the xml2 tool from http://ofb.net/~egnor/xml2/ # and sed, of course. EPUB_LIST=(my*.epub) for f in "${EPUB_LIST[@]}"; do echo "$f:" unzip -p "$f" OEBPS/toc.ncx | xml2 | sed -n -e 's:^/ncx/navMap/navPoint/navLabel/text=: :p' echo done |
![]() |
![]() |
![]() |
#2 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,007
Karma: 89771343
Join Date: Nov 2011
Location: Charlottesville, VA
Device: Kindles
|
Are you doing this for a specific EPUB or are you trying to make something more general purpose? In general the TOC file doesn’t always have a specific name, or might not even be present in EPUB 3. And TOC entries don’t always point to the beginning of an XHTML file.
You may want to consider implementing this as a plugin of an EPUB editor such as calibre or sigil. |
![]() |
![]() |
Advert | |
|
![]() |
Tags |
epub, html ebook, shell script |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
HTML to Epub conversion isn't generating TOC through command line | marcosalles | Conversion | 1 | 02-20-2014 10:09 PM |
Online HTML book -> epub: TOC from single file | dancal | Conversion | 0 | 01-27-2014 01:45 PM |
Extract TOC from EPUB | vishnu.kumar | Conversion | 11 | 08-08-2012 06:54 AM |
Kindler previewer not recognizing toc.ncx file, my html toc, or the start point... | petercrowell | Kindle Formats | 2 | 05-01-2012 08:14 AM |
HTML input plugin stripping text within toc tags in child html file | nimblebooks | Conversion | 3 | 02-21-2012 03:24 PM |