View Single Post
Old 02-27-2013, 10:13 PM   #1
Pranananda
Connoisseur
Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.
 
Pranananda's Avatar
 
Posts: 97
Karma: 115862
Join Date: Apr 2010
Location: Humboldt County, California
Device: ipad, iPod touch, JetBook Lite
mktoc.pl: create table of contents in HTML file

While there is software to create table of contents for ebooks, ie calibre creates a metadata table of contents, Sigil has support for this, as well as other converters, I mostly create my ebooks from HTML files and use calibre to create epubs or mobis. I have written a script that will create a table of contents where you want it, at the top level of the book, and for chapters and sections that have sub-chapters and sub-sections, that will link the elements of the table of contents to the child sections, with a link back to the parent table of contents. Here is the documentation from the script's -m option:

mktoc.pl creates a doubly-linked table of contents from a HTML files. The input is STDIN, the output goes to STDOUT. The command line syntax is:
./mktoc.pl < input.html > output.html

Do NOT use the same file name for the input and output file.

You will need to make the script executable before using it, or on Window you will have to use the follow command line syntax (after you download perl from http://www.perl.org) :
perl mktoc.pl < input.html > output.html

The entries in the table of contents come from the contents between a header tag and its end tag, i.e. <h1>...</h1>, <h2>...</h2>, <h3>...</h3>, and <h4>...</h4>.

The input ought to be run through tidy before using this script, though this is not exactly necessary. This script won't work if there are multiple header tags per line, for example:
<h1>Hello</h1><h2>This</h2><h3>Is a Test</h3> <!-- INCORRECT -->

mktoc.pl will also not produce good results if header tags were used for uses other than demarcating the beginning of a chapter or section. If header tags are used purely for formatting rather than document structure, the generated table of contents will be nonsensical.

mktoc.pl can be run multiple times on the same input file, that is, if you did the following:
./mktoc.pl < input.html > output.html
./mktoc.pl < output.html > newoutput.html
mktoc.pl can remove the traces of previous runs before applying its changes.

Also, traces of previous runs of mktoc.pl can be removed as follows:
./mktoc.pl -c < input.html > output.html

If a header has subheaders after it, those subheader will be placed into a small table of contents after the header declaration in the output file.

You can specify where the top level Table of Content goes, by including the following comment into your HTML input files:
<!-- toc -->
If you don't have this comment in your HTML input file, the top level table of contents will go right after the <body> tag.

Each table of contents placed in the output file will be surrounded by a <div class="toc">. Each line of the table of contents is a simple <p> tag. You may want to put an entry into your CSS for those elements, for example (and you can make your own style up here):
div.toc { margin: 1em 1.1em; }
div.toc p { margin: 0; text-indent: 0; }

If you would like to see an example of a table of contents that was created with this script (well, an older version), see The Life Science Health System.
Attached Files
File Type: pl mktoc.pl (15.6 KB, 76 views)
Pranananda is offline   Reply With Quote