View Full Version : Manually trimming the metadata.opf and toc.ncx file


ProDigit
04-15-2012, 10:10 AM
So, instead of totally starting from scratch, I thought it might do me good to use an existing epub as template and edit it's parameters to fit my needs for an epub.

I've noticed there's a lot of code that might be useless in the metadata.opf and toc.ncx file, especially for when I create an ebook to be read on an ebook reader that has no access to the internet.

The files I have contain the following data (which I think I can trim somewhat):

metadata.opf:
<?xml version="1.0" encoding="UTF-8"?>
<package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="calibre_id">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf"
xmlns:calibre="http://calibre.kovidgoyal.net/2009/metadata">
<dc:title>Holy Bible</dc:title>
<dc:creator opf:role="aut" opf:file-as="Version, King James">King James Version</dc:creator>
<dc:contributor opf:role="bkp" opf:file-as="calibre">calibre (0.5.14)
[http://calibre.kovidgoyal.net]</dc:contributor>
<dc:identifier opf:scheme="calibre" id="calibre_id">
95e823ba-8f88-4c44-9f9d-b22ff04d5358</dc:identifier>
<dc:date>2009-06-16T04:03:49</dc:date>
<dc:language>UND</dc:language>
<meta name="calibre:series_index" content="1"/>
<meta name="calibre:rating" content="0"/>
</metadata>

After this the file mainly links to internal files and id refs (aka manifest).

the second file,
toc.ncx
<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1"
xml:lang="en">
<head>
<meta name="dtb:uid"
content="95e823ba-8f88-4c44-9f9d-b22ff04d5358" />
<meta name="dtb:depth" content="3" />
<meta name="dtb:generator" content="calibre" />
<meta name="dtb:totalPageCount" content="0" />
<meta name="dtb:maxPageNumber" content="0" />
</head>
<docTitle>
<text>Table of Contents</text>
</docTitle>
<navMap>
<navPoint id="d362620c-c3f8-45e2-8e63-2a62a2757f81"
playOrder="1">
<navLabel>
<text>The Holy Bible</text>
</navLabel>


I've included the first page that it displays too, because there's something I yet don't understand; namely the 'navPoint id' with string.


My question is, can I trim the first file to look like this:
<?xml version="1.0" encoding="UTF-8"?>
<package version="2.0">
<metadata>
<dc:title>Holy Bible</dc:title>
<dc:creator opf:role="aut"
opf:file-as="Version, King James">King James Version</dc:creator>
<dc:identifier opf:scheme="calibre"
id="calibre_id">95e823ba-8f88-4c44-9f9d-b22ff04d5358</dc:identifier>
<dc:language>UND</dc:language>
</metadata>

with the exception that I want to remove calibre, and perhaps find out if the dc:identifier is needed or can be removed too;or would I have trimmed the file too much; or perhaps can I trim even more data from the heading/header (or whatever you may call it)?

And second file:
<ncx version="1"
xml:lang="en">
<head>
content="95e823ba-8f88-4c44-9f9d-b22ff04d5358" />
</head>
<docTitle>
<text>Table of Contents</text>
</docTitle>
<navMap>
<navPoint id="d362620c-c3f8-45e2-8e63-2a62a2757f81"
playOrder="1">
<navLabel>
<text>The Holy Bible</text>
</navLabel>
I wanted to trim like this, but perhaps remove ncx version, and any hexadecimal string I find in the document (like the navpoint id, and content)?
Is that possible, or do I need the navPoint id, and string mentioned in an epub?

I'm not interested in copyright issues, as the file is out of copy right, and I'll also be developing my own versions (for personal use).

DiapDealer
04-15-2012, 11:07 AM
I'm not exactly sure why you think it's necessary to "trim" things that you may not understand, but I can assure you that removing the namespace declaration(s) for an NCX or OPF file will definitely render them useless to most reading systems. The URIs in those XML namespace declarations are not "accessing the internet," and you can't just remove them simply because your epub won't ever "access the internet." That's not what they're for.

Also the NavPoint ID is required by spec... and removing the <content /> tags would make the point of creating an NCX file in the first place, rather moot. No href = no functionality.

Why the "Trimming" obsession, anyway? I'm genuinely curious. These two files don't really have any "fat" in them to begin with (with the exception of maybe a few items between the <metadata></metadata> and <guide></guide> tags of the OPF). The rest is really quite critical—unless you want to eliminate the NCX completely. There's no requirement that you have to have an NCX, but removing it will probably require an alteration to your OPF file... if you're copying them from preexisting, functional ePubs.

I've noticed there's a lot of code that might be useless in the metadata.opf and toc.ncx file
Using this assumption as the basis for your learning/experimentation is probably not the best idea. Wouldn't it be better to assume that portions of code are required—until you learn otherwise?

OPF specs (http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm)
NCX specs (http://www.niso.org/workrooms/daisy/Z39-86-2005.html#NCX)

ProDigit
04-15-2012, 01:09 PM
I find that an epub should work perfectly without these lines of code.
HTML can, why MUST epub have an author or a title?
What if the Author does not want to put his name there?

About trimming, I can do in HTML what I can do in epub,only 3 times smaller.
I find a lot of the coding of epubs inefficient. Repeating twice the same thing, needing lengthy tags to define something, and especially the hex strings on the navpoint ID's I find useless.
The only purpose it serves is have nice hyperlinks and background scanning, and database sorting capabilities. It serves a purpose somewhere I suppose, but I don't think it should automatically be a necessary part of the code.
It reminds me of Windows Vista, compared to Windows XP/98. Windows 98/XP just do what need to be done. Windows Vista is a memory and power hog, that consumes power, and does unnecessary things in the background to optimize time and reduce latency, to compensate for the time it loses doing those unnecessary thins in the first place.

If I can do something simple in HTML, why make it complex in epub?
Why not make ePub compatible with HTML, and leave it simple where it needs to be simple?

Like, this would be a nice toc for me:

<title>Table of contents</title>
link/to/1st.document
link/to/2nd.document
link/to/3rd.document
link/to/4th.document
link/to/5th.document


Would be an example of a very efficient code that automatically assumes first document is first to be displayed in the book, is called "chapter 1",and knows the location of that chapter.

now compare that to this:
<ncx version="1"
xml:lang="en">
<head>
content="95e823ba-8f88-4c44-9f9d-b22ff04d5358" />
</head>
<docTitle>
<text>Table of Contents</text>
</docTitle>
<navMap>
<navPoint id="d362620c-c3f8-45e2-8e63-2a62a2757f81"
playOrder="1">
<navLabel>
<text>Book title</text>
</navLabel>
<content src="content/CompleteA_revised_split_0.html" />
</navPoint>
<navPoint id="5809ab0e-a3b1-446b-b4d8-ad487a1e546b"
playOrder="2">
<navLabel>
<text>Chapter</text>
</navLabel>
<content src="content/CompleteA_revised_split_2.html" />
<navPoint id="1c4e5abf-96dd-42a3-9604-0936f9c535e0"
playOrder="3">
<navLabel>
<text>Chapter 1</text>
</navLabel>
<content src="content/CompleteA_revised_split_2.html" />
</navPoint>
<navPoint id="4e14c23c-a836-414f-850f-ce1484f98b4a"
playOrder="4">
<navLabel>
<text>Chapter 2</text>
</navLabel>
<content src="content/CompleteA_revised_split_3.html" />
</navPoint>
<navPoint id="bd77b9a0-c3f9-400e-b36c-290f896ac923"
playOrder="5">
<navLabel>
<text>Chapter 3</text>
</navLabel>
<content src="content/CompleteA_revised_split_4.html" />
</navPoint>


Looking at the very basics, it's saying the same thing;and in an ebook reader both can be showing exactly the same on the screen; namely, that I want it to display a toc directing to the first 5 chapters; and use that toc to play back chapter 5 after 4 after 3 after 2 after 1, after the toc. But see the amount of code that's been implemented to reach to such result in current version epub!

Trimming code may not make a lot of sense for regular books, but it does for bibles, and dictionaries,and encyclopedia's with tons of chapters, pages, and reference notes.

DiapDealer
04-15-2012, 01:40 PM
I find that an epub should work perfectly without these lines of code.
I frankly don't know how to respond to that. Or your entire post. That's imbecilic. I thought you had a real interest in learning how ePubs are structured, but I can see your experiment is simply an excuse to wax philosophical about what you think ePub should be. Carry on.

ProDigit
04-15-2012, 01:56 PM
I am, just as I could call you 'imbecilic' too, like you so generously spread around; but I'll refrain myself from using those words!!

But I'm just saying that it makes no sense to make things complicated when ithey could have invented a very good and optimized code formatting, especially if it's for mobile devices where every code line just consumes unnecessary CPU!

And aside from that; I'm still interested in what lines one can safely remove without breaking the epub, meaning I don't really care of not having an epub with all bells and whistles, since I am mainly going to use the epubs in hardware Ebook readers instead of on a pc which supports external links and all other advanced stuff like library organizations etc...

on my ebook reader I open books from file structure, not by author.

And I'll repeat:
What is the use of including external http links when the device can't connect to the internet anyway? Unless these lines are purely informative, there's no reason to keep them in the book, and certainly should not be made a requirement for ebooks.

ProDigit
04-15-2012, 02:01 PM
Using this assumption as the basis for your learning/experimentation is probably not the best idea.
specs[/URL]

My 'assumption' is just a voice to be heard, an opinion, not an assumption for learning, as you say it is. An opinion which must be expressed; for if not, then someone else will, or if not, then pretty soon we'll end up with ebooks containing 80% code, and 20% book, while I am able to keep 1-5% code, and 95-99% book in HTML.


Wouldn't it be better to assume that portions of code are required—until you learn otherwise?

Portions of the code are required indeed, but definitely not all the code!
And part of this thread is to figure out what is, and what is not.

Just like I've already stopped asking out about the way the toc handles the chapters, because that's just a way epubs operate, I am merely challenging some of the header code to see if one can't do without.

DiapDealer
04-15-2012, 02:04 PM
And I'll repeat:
What is the use of including external http links when the device can't connect to the internet anyway?
And I'll repeat, just this once: XML namespace declarations are NOT external http links. No matter how much they might look like they are. If you remove them you've broken the OPF file and the NCX file. Period.

ePub is not HTML. It never will be. Wishing it were, won't help you in the least. Now I'm really done. I wish you luck in whatever it is you think you're trying to accomplish.

JSWolf
04-15-2012, 02:14 PM
ProDigit, what you are trying to do is find shortcuts to do what you want to do when those shortcuts won't work because what you want to remove from toc.ncx are things that HAVE to be there. Why not just leave them be since they HAVE to be there?

Also, why are you so concerned with saving a few bytes here and there? Will any of your readers fail to open the ePub? Nope, they won't. Also, making an internal (page with a lit of links) is not more efficient then a properly made toc.ncx even if the internal ToC is smaller in size.

One way to optimize things (that works) is to remove all the tabs/spaces in front of all the lines.

So instead of having this
<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1"
xml:lang="en">
<head>
<meta name="dtb:uid"
content="95e823ba-8f88-4c44-9f9d-b22ff04d5358" />
<meta name="dtb:depth" content="3" />
<meta name="dtb:generator" content="calibre" />
<meta name="dtb:totalPageCount" content="0" />
<meta name="dtb:maxPageNumber" content="0" />
</head>
<docTitle>
<text>Table of Contents</text>
</docTitle>
<navMap>
<navPoint id="d362620c-c3f8-45e2-8e63-2a62a2757f81"
playOrder="1">
<navLabel>
<text>The Holy Bible</text>
</navLabel>

you have this
<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1"
xml:lang="en">
<head>
<meta name="dtb:uid"
content="95e823ba-8f88-4c44-9f9d-b22ff04d5358" />
<meta name="dtb:depth" content="3" />
<meta name="dtb:generator" content="calibre" />
<meta name="dtb:totalPageCount" content="0" />
<meta name="dtb:maxPageNumber" content="0" />
</head>
<docTitle>
<text>Table of Contents</text>
</docTitle>
<navMap>
<navPoint id="d362620c-c3f8-45e2-8e63-2a62a2757f81" playOrder="1">
<navLabel>
<text>The Holy Bible</text>
</navLabel>

You could also shorten the id as long as the id is unique.

Elfwreck
04-15-2012, 02:21 PM
I find that an epub should work perfectly without these lines of code.
HTML can, why MUST epub have an author or a title?
What if the Author does not want to put his name there?

It needs a title; it doesn't need an author.

The author can use "Unknown" or "Redacted" or "anonymous" or "decline to state" or a pseudonym of choice--or not put in an author. But the book needs a title so that the software can identify it.

It needs a title in the same way that a digital file needs a name. HTML doesn't, because it's not designed to be read by software that uses "title" to list & sort files.

The Epub format construction guide (http://www.hxa.name/articles/content/epub-guide_hxa7241_2007.html) goes into detail about what is and is not required.

About trimming, I can do in HTML what I can do in epub,only 3 times smaller.

Then why aren't you just making HTML files and releasing those? ("Because most ebook readers won't read HTML, and when they do, they don't support all the features I want to include." Which, um, leaves you with "use a format designed to work for most books," which includes features that aren't directly important to your book.)

Epubs were designed to include both metadata and navigation options that HTML has problems with, and to run on remote devices with strict memory limitations.

The only purpose it serves is have nice hyperlinks and background scanning, and database sorting capabilities. It serves a purpose somewhere I suppose, but I don't think it should automatically be a necessary part of the code.

Hyperlinks, background scanning, and database sorting options are not bloatware options in ebooks.

If I can do something simple in HTML, why make it complex in epub?
Why not make ePub compatible with HTML, and leave it simple where it needs to be simple?

Because it needs to support options that aren't important to you.

Like, this would be a nice toc for me:

<title>Table of contents</title>
link/to/1st.document
link/to/2nd.document
link/to/3rd.document
link/to/4th.document
link/to/5th.document


Would be an example of a very efficient code that automatically assumes first document is first to be displayed in the book, is called "chapter 1",and knows the location of that chapter.

That's basically what it is. It takes more characters to describe because it supports options you don't care about--subset categories, links within documents, readable metadata.

now compare that to this:
[code]<ncx version="1"
xml:lang="en">
<head>
content="95e823ba-8f88-4c44-9f9d-b22ff04d5358" />
</head>

That, you can change. Pick a unique DocID system of your own. You don't have to use the random string of numbers and letters. Many people use ISBNs for the DocID.

<docTitle>
<text>Table of Contents</text>
</docTitle>
<navMap>
<navPoint id="d362620c-c3f8-45e2-8e63-2a62a2757f81"
playOrder="1">
<navLabel>
<text>Book title</text>
</navLabel>
<content src="content/CompleteA_revised_split_0.html" />
</navPoint>
<navPoint id="5809ab0e-a3b1-446b-b4d8-ad487a1e546b"
playOrder="2">
<navLabel>
<text>Chapter</text>
</navLabel>[/quote]

You don't need the long random-number strings. You can rename the ID points to make sense & be easy to follow:

<docTitle>
<text>Deliver Us</text>
</docTitle>
<navMap>
<navPoint id="navPoint-1" playOrder="1">
<navLabel>
<text>Deliver Us</text>
</navLabel>
<content src="Text/03Titlepage.xhtml"/>
</navPoint>
<navPoint id="navPoint-2" playOrder="2">
<navLabel>
<text>Disclaimer</text>
</navLabel>
<content src="Text/05Disclaimer.xhtml"/>
</navPoint>
...and so on.

Looking at the very basics, it's saying the same thing;and in an ebook reader both can be showing exactly the same on the screen; namely, that I want it to display a toc directing to the first 5 chapters; and use that toc to play back chapter 5 after 4 after 3 after 2 after 1, after the toc. But see the amount of code that's been implemented to reach to such result in current version epub!

Because not everyone builds them the same way you have. Saying "it should be simpler because I don't care about the other options" is pointless. A person might want multiple navLabel sections inside a navPoint, which you can't do with a single-level "ToC Label: URL" arrangement.

Trimming code may not make a lot of sense for regular books, but it does for bibles, and dictionaries,and encyclopedia's with tons of chapters, pages, and reference notes.

Very long and complex books have always been problematic for publishing. This is not a sign that the publishing standards are flawed; it's a sign that you're working on a book that pushes the limits of the format.

Bibles are often printed in small type on very thin paper because if they were printed on normal paper, they'd be several volumes long. Trying to cram a small-encyclopedia-length work into a single volume is going to be troublesome.

If it helps, there's a build-epub-from-scratch tutorial at the Spontaneous Derivation wiki (https://sites.google.com/site/spontaneousderivation/an-epub-tutorial). It goes through the bare minimum requirements for an epub's structure.

ProDigit
04-15-2012, 02:26 PM
One way to optimize things (that works) is to remove all the tabs/spaces in front of all the lines.

You could also shorten the id as long as the id is unique.

Yeah, I was planning on doing that, but I want to know WHY it is that I can't remove any of the header things (other than 'just because I say so', or 'because it's written here somewhere.')?
Wouldn't it be nice that you know which lines are mandatory, and which aren't?

Sometimes they say all lines are, but if you can figure most devices can read it even without those lines it's valuable information.

Just like I can display an HTML in most browsers, even without head or data like beneath here.In fact, in most my HTML's I just remove this info, work without (or minimal) class, and css and all that... I only use it if my code benefits from it (if I save writing code, without adding too much complexity).

Example of some lousy HTML code that can easily be removed from the HTML, and with a small rewrite of the HTML codes within the body, will not significantly reduce style of the HTML:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.2 Document//EN" "http://openebook.org/dtds/oeb-1.2/oebdoc12.dtd">
<html xml:lang="en-us" xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="Content-Type" content="text/html;" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title> - Intelligent Design by Sharon Lee and Steve Miller</title>
<meta name="Publisher" content="Baen Books" />
<meta name="Copyright" content="2011 by Patrick Lundrigan, Larry Correia, Travis S. Taylor, Robert Buettner, Sharon Lee & Steve Miller" />
<meta name="Author" content="Patrick Lundrigan, Larry Correia, Travis S. Taylor, Robert Buettner" />
<style type="text/css">
p {text-indent:2em;margin-top:0;margin-bottom:2px}
h1 {page-break-before:left}
p.chapter {margin: 135.0px 0.0px 30.0px 0.0px; line-height: 24.1px; font: 28.0px 'Times New Roman'; color: #2e2829; font-weight:bold; text-align:center;}
p.p4 {margin: 0.0px 0.0px 0.0px 0.0px; text-align: center}
span.s1 {font-style: italic}
span.s2 {text-decoration: underline}
</style>
<script type="text/javascript" language="javascript"><!--
function setStyle()
{
if (parent.control) if (parent.control.mainLoad) parent.control.mainLoad(document);
if (window.focus) window.focus();
}
function PNo(PgNo)
{
if (parent.control) if (parent.control.SetPage) parent.control.document.forms[0].PageNo.value = PgNo;
}
setStyle();
//--></script>
</head>



I think the search for smaller,cleaner code is not a bad one,and often overlooked.
Smaller code is the difference between the now near to extinct excite,altavista and yahoo search engines; vs google.

JSWolf
04-15-2012, 02:32 PM
There is nothing wrong with wanting to be as efficiant in your coding as possible. But the things that need to be there need to be there are your wanting to remove them is not going to work.

ProDigit
04-15-2012, 02:33 PM
Again,
I don't want to remove lines that are NEEDED, but merely question what is really NEEDED, and what is just filler... (the main reason I wrote this thread)

I had actually hoped to save some time, but it seems I would have saved more time doing the testing myself, and then write my conclusions on this website.

JSWolf
04-15-2012, 02:37 PM
The problem is that some reading systems might ignore the erorrs (missing elements) and appear to work. But, other software could fail. So if you make it such that it's not in spec but works now, you could later on have to go in and fix it for some different/newer reading software. It's not worth it to remove what's needed.

But one thing you can do is use FlightCrew to verify the ePub. It will tell you what is missing that you need.

ProDigit
04-15-2012, 02:50 PM
Unfortunately I only have 2 remaining devices that read epub. Otherwise I could take my time, do the research, and post the results on this site.
If it turns out that most ebook readers ignore certain errors, without glitches, then it's good news for me.
Many reading devices have the same software or even hardware inside, and should operate in a similar manner.

So far it's not clear how they respond to all these variations. I know from the little time I had playing with epub, that not everything can be removed. I know that reading devices are pretty strict on their code.

And I hope some programmers might read this thread and decide it indeed benefits to not make everything mandatory, but just imply a code within reader software/firmware, that if some specific line of code is not present, a standard pattern will be followed.
I think they should have done that from the start; Like why does every epub needs to have a mimetype file and a container.xml, if for most books these files are identical?

I presume mimetype file is for mac/linux, who reads the first bytes of a file, to determine what program it needs to open the file (probably also why you can't compress that file); but container.xml is different.

Jellby
04-15-2012, 03:28 PM
As a general answer, most of the things you find unnecessary, maybe are unnecessary. But the ePub format is not something created from scratch just for books, it uses some file formats and conventions that already existed, to make it easier to create reading applications and books from already existing code, to make it easier to parse them (because there are already tools that can deal with those things), etc. And although those pieces may look like garbage to you, they are there to let software know what it's dealing with. The effect they will have in the final filesize will most likely be minimal.

I would advise you to have a look at some of the books I've uploaded here. They're all coded "by hand", and have pretty minimal markup overhead, I believe. Of course, they have some things you think unnecessary, like author, TOC, illustrations or description.

ATDrake
04-15-2012, 03:47 PM
Wouldn't it be nice that you know which lines are mandatory, and which aren't?

You could just save time by reading the official ePub specifications (http://idpf.org/epub) which do actually detail which things are mandatory and which things are optional in terms of reader device/software support for a fully-compliant ePub file.

Now, if you just want to see how much of the mandatory stuff you can get away with leaving out before whatever error-correction exists in various readers starts to choke on not having them there, that's a different matter.

ETA: If you really want to make a truly minimal ePub, start from scratch because adapting other people's templates means you end up with whatever features they decided to use, which you may or may not consider totally superfluous to your own book.

Keroberos
04-15-2012, 08:46 PM
And I hope some programmers might read this thread and decide it indeed benefits to not make everything mandatory, but just imply a code within reader software/firmware, that if some specific line of code is not present, a standard pattern will be followed.
I think they should have done that from the start; Like why does every epub needs to have a mimetype file and a container.xml, if for most books these files are identical?This makes no sense, you are wanting to remove xhtml from the epub in an attempt to make it use less resources, but you want the e-reader programmers to add more code to their readers to correct for the missing bits you think aren't necessary, that kind of code uses resources--probably more than you could save by removing the xhtml restrictions. Web browsers allow for incorrect html because they have a lot of code in them to help correct broken/buggy html.

The epub spec is more strict so they can be opened with the most minimal hardware/software requirements.

ProDigit
04-16-2012, 01:31 AM
ETA: If you really want to make a truly minimal ePub, start from scratch because adapting other people's templates means you end up with whatever features they decided to use, which you may or may not consider totally superfluous to your own book.

That's what I'm basically doing. Through the process of elimination,I eliminate what I don't want or need in my book; though I got to spend some time tomorrow doing more testing on my device,to see what works,and what doesn't.

Toxaris
04-16-2012, 02:16 AM
Just stick to the specifications. Even if most readers now will allow you to create invalid ePUB's and still work, they are still wrong and there is no guarantee they will still work on other readers.
The ePUB format is already quite slim (contrary to what you claim) and compression makes it even smaller. By tinkering with the opf/ncx you might save 10kb, but probably a lot less. It will take a lot of effort and will gain you almost nothing and possibly generate faulty ePUBs.

The 'why is this needed' is an academical question and should be treated as such. Not for publication of books.

ProDigit
04-16-2012, 08:53 AM
I personally do not believe ePub is optimized code at all!
From there my complaint, if I would have invented the epub, I would have given it the options it now has, but I would have optimized code to look more like the code in this post:
http://www.mobileread.com/forums/showpost.php?p=2043727&postcount=3

and not make any hyperlinks and headers mandatory for coding the page.

If you understand anything about coding an HTML document, I've included 2 html's for you to take a look at.
This is what I call "optimal coding".
One is the smallest chapter I could find in the new testament. I chose that one, because not the text of the bible matters here, but the coding done to create this page.
The other is a bible framework I used from before I had only the Sony PRS-505 reader, which was only able to read LRF files back in the days.
LRF files do not support an internal toc, so I had to create one from scratch in HTML format.

If you look at both pages, it's possible, but very difficult to make the coding overhead any smaller than this (see files below).
I had to compress the file, because this site does not allow uploading of HTML files.

Doitsu
04-16-2012, 09:54 AM
Usually, I don't feed trolls, but in your case I couldn't resist.

I personally do not believe ePub is optimized code at all!
Nobody said it was and nobody forces you to use advanced ePub features. Make your book as small and as plain as possible, remove as many mandatory ePub elements as you can get away with and enjoy the glorious feeling that your ebook will open a few fractions of a second faster.
Don't like header tags or hyperlinks? Simply remove them and use the search function of your reader instead. It will probably take at least twice as long, but at least you'll be reading an "optimized" ebook.

IMHO, optimizing an ePub for size instead of layout or functionality doesn't make any sense at all. OTOH, if you have too much time on your hands, who am I to tell you what to do with it.

huebi
04-16-2012, 10:28 AM
If you understand anything about coding an HTML document, I've included 2 html's for you to take a look at.


nxc and toc files are NOT HTML. They are XML, and in XML there are things just mandantory. You have to live with that, otherwise you are producing invalid ePubs. Please read the specs BEFORE you're fantasiting or doing assumption only based on thoughts.

ePub is an standardized standard and some things are just defined as they are. Its ridiculus to discuss about that.

DiapDealer
04-16-2012, 11:01 AM
and not make any hyperlinks and headers mandatory for coding the page.
You keep saying this, but you keep ignoring the fact that you've been told over and over that there are no mandatory hyperlinks in the OPF/NCX files. What you keep mistakenly referring to as "hyperlinks" are the xml namespace declarations. Those namespace declarations could be named anything that uniquely indentifies them. But the various developers/groups that have defined those namespaces, use familiar URLs for their unique names so that it's easier for people who have no idea what those namespace specifications are to research those specs. The namespace URI is not used by the parser to look up information. But they need to be there so parsers know which namespace scheme you're using.