MobileRead Forums - View Single Post

st_albert · 02-27-2011, 07:45 PM

OK, I'm almost embarrassed to post this, but here is a very simple perl script that will read the epub's toc.ncx and create an html block (not a complete html file -- there is no <head> section) that provides a bare-bones inline TOC when you include it in your epub.

Code:

#!/usr/bin/perl
# parse a toc.ncx file and output xhtml statements for an in-line TOC file
#
#  usage:  parse_ncx.pl < toc.ncx > toc.txt
#
### sample of data we are interested in:
###
#<?xml version="1.0" encoding="UTF-8"?>
#<!DOCTYPE ncx PUBLIC "-//NISO//DTD ncx 2005-1//EN" "http://www.daisy.org/z3986/2005/ncx-2005-1.dtd">
#<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" xml:lang="en" version="2005-1">
#<head>
#<meta name="dtb:uid" content="awp2010-06-03T15:07:06Z"/>
#<meta name="dtb:depth" content="2"/>
#<meta name="dtb:totalPageCount" content="0"/>
#<meta name="dtb:maxPageNumber" content="0"/>
#</head>
#<docTitle>
#<text>The Cresperian Alliance</text>
#</docTitle>
#<navMap>
#<navPoint id="navpoint-1" playOrder="1">
#<navLabel>
#<text>Title page</text>   => use for $item_text
#</navLabel>
#<content src="1.html"/>   =>  user for $target
#</navPoint>
# etc for all <navPoint> ... </navPoint> entries.
#</navMap>
#</ncx>


$item_text = "X";
$target = "X";

$| =1;  #don't buffer writes to stdout.

print "<h2>Table of Contents</h2>\n" ;

while (<STDIN>) {  # read a line and look for info
      if (/^#/) {next;}  #skip blank lines
      if (m\<text>(.+?)</text>\) {
	$item_text=$1;
      }  elsif (m\<content src="(.+?)"/>\) {
	  $target = $1;
	   print ('<div><a href="../', $target, '">', $item_text, "</a></div>\n" ) ;
#      }  elsif (m\</navPoint>\) {
#	  print ("<p><a href='../", $target, "'>", $item_text, "</a></p>\n" ) ;
      }
   } ; # end while
   print "\n<!-- Done -->\n";

Yes, it's really crude. It doesn't preserve the structure of multi-level TOC's, though that could be easily added. So could I/O parameters for input and output filenames. That is left for an exercise for the student.

For now, just copy the code section and paste it as "parse_ncx.pl"

As it is, it is used as a "filter" like so :

first, unpack the epub, or at least extract the toc.ncx file into your working directory. Then, from a command prompt, do

parse_ncx.pl < toc.ncx > toc.txt

This will read from the toc.ncx file and write to a new file called "toc.txt" the html equivalents of the ncx table of contents.

Then, you edit the epub (say, with sigil) and create a new blank toc.xhtml file, paste in the contents of toc.txt into the <body> of that file, assign the "table of contents" semantic to your new toc.xhtml file, and save. Now you have an inline xhtml toc file.

If you then use kindlegen to convert it to mobi format, you will at least have the inline TOC that you need. Of course there are other issues of the epub -> mobi conversion that will need to be dealt with as well. (another exercise for the student.

)

This assumes you have perl on your operating system. If you are on Windows, you may need to install perl in order for this to work. Do a Google search for "Active Perl" and go for it. It's free.

And as always, if you don't understand what's going on here at all, this post isn't for you. If you have specific questions, and comments / criticisms of the code, then have at it!

I only posted this schlock program because nobody else seemed to want to jump in with a better solution. This is what works for me.

02-27-2011, 07:45 PM	#3
st_albert Guru Posts: 696 Karma: 150000 Join Date: Feb 2010 Device: none	OK, I'm almost embarrassed to post this, but here is a very simple perl script that will read the epub's toc.ncx and create an html block (not a complete html file -- there is no <head> section) that provides a bare-bones inline TOC when you include it in your epub. Code: #!/usr/bin/perl # parse a toc.ncx file and output xhtml statements for an in-line TOC file # # usage: parse_ncx.pl < toc.ncx > toc.txt # ### sample of data we are interested in: ### #<?xml version="1.0" encoding="UTF-8"?> #<!DOCTYPE ncx PUBLIC "-//NISO//DTD ncx 2005-1//EN" "http://www.daisy.org/z3986/2005/ncx-2005-1.dtd"> #<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" xml:lang="en" version="2005-1"> #<head> #<meta name="dtb:uid" content="awp2010-06-03T15:07:06Z"/> #<meta name="dtb:depth" content="2"/> #<meta name="dtb:totalPageCount" content="0"/> #<meta name="dtb:maxPageNumber" content="0"/> #</head> #<docTitle> #<text>The Cresperian Alliance</text> #</docTitle> #<navMap> #<navPoint id="navpoint-1" playOrder="1"> #<navLabel> #<text>Title page</text> => use for $item_text #</navLabel> #<content src="1.html"/> => user for $target #</navPoint> # etc for all <navPoint> ... </navPoint> entries. #</navMap> #</ncx> $item_text = "X"; $target = "X"; $\| =1; #don't buffer writes to stdout. print "<h2>Table of Contents</h2>\n" ; while (<STDIN>) { # read a line and look for info if (/^#/) {next;} #skip blank lines if (m\<text>(.+?)</text>\) { $item_text=$1; } elsif (m\<content src="(.+?)"/>\) { $target = $1; print ('<div><a href="../', $target, '">', $item_text, "</a></div>\n" ) ; # } elsif (m\</navPoint>\) { # print ("<p><a href='../", $target, "'>", $item_text, "</a></p>\n" ) ; } } ; # end while print "\n<!-- Done -->\n"; Yes, it's really crude. It doesn't preserve the structure of multi-level TOC's, though that could be easily added. So could I/O parameters for input and output filenames. That is left for an exercise for the student. For now, just copy the code section and paste it as "parse_ncx.pl" As it is, it is used as a "filter" like so : first, unpack the epub, or at least extract the toc.ncx file into your working directory. Then, from a command prompt, do parse_ncx.pl < toc.ncx > toc.txt This will read from the toc.ncx file and write to a new file called "toc.txt" the html equivalents of the ncx table of contents. Then, you edit the epub (say, with sigil) and create a new blank toc.xhtml file, paste in the contents of toc.txt into the <body> of that file, assign the "table of contents" semantic to your new toc.xhtml file, and save. Now you have an inline xhtml toc file. If you then use kindlegen to convert it to mobi format, you will at least have the inline TOC that you need. Of course there are other issues of the epub -> mobi conversion that will need to be dealt with as well. (another exercise for the student. ) This assumes you have perl on your operating system. If you are on Windows, you may need to install perl in order for this to work. Do a Google search for "Active Perl" and go for it. It's free. And as always, if you don't understand what's going on here at all, this post isn't for you. If you have specific questions, and comments / criticisms of the code, then have at it! I only posted this schlock program because nobody else seemed to want to jump in with a better solution. This is what works for me.