Manually trimming the metadata.opf and toc.ncx file

ProDigit · 04-15-2012, 09:10 AM

So, instead of totally starting from scratch, I thought it might do me good to use an existing epub as template and edit it's parameters to fit my needs for an epub.

I've noticed there's a lot of code that might be useless in the metadata.opf and toc.ncx file, especially for when I create an ebook to be read on an ebook reader that has no access to the internet.

The files I have contain the following data (which I think I can trim somewhat):

metadata.opf:

Code:

<?xml version="1.0"  encoding="UTF-8"?>
<package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="calibre_id">
    <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf"
         xmlns:calibre="http://calibre.kovidgoyal.net/2009/metadata">
        <dc:title>Holy Bible</dc:title>
        <dc:creator opf:role="aut" opf:file-as="Version, King James">King James Version</dc:creator>
        <dc:contributor opf:role="bkp" opf:file-as="calibre">calibre (0.5.14) 
         [http://calibre.kovidgoyal.net]</dc:contributor>
        <dc:identifier opf:scheme="calibre" id="calibre_id">
          95e823ba-8f88-4c44-9f9d-b22ff04d5358</dc:identifier>
        <dc:date>2009-06-16T04:03:49</dc:date>
        <dc:language>UND</dc:language>
        <meta name="calibre:series_index" content="1"/>
        <meta name="calibre:rating" content="0"/>
    </metadata>

After this the file mainly links to internal files and id refs (aka manifest).

the second file,
toc.ncx

Code:

<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1"
xml:lang="en">
  <head>
    <meta name="dtb:uid"
    content="95e823ba-8f88-4c44-9f9d-b22ff04d5358" />
    <meta name="dtb:depth" content="3" />
    <meta name="dtb:generator" content="calibre" />
    <meta name="dtb:totalPageCount" content="0" />
    <meta name="dtb:maxPageNumber" content="0" />
  </head>
  <docTitle>
    <text>Table of Contents</text>
  </docTitle>
  <navMap>
    <navPoint id="d362620c-c3f8-45e2-8e63-2a62a2757f81"
    playOrder="1">
      <navLabel>
        <text>The Holy Bible</text>
      </navLabel>

I've included the first page that it displays too, because there's something I yet don't understand; namely the 'navPoint id' with string.

My question is, can I trim the first file to look like this:

Code:

<?xml version="1.0"  encoding="UTF-8"?>
<package version="2.0">
    <metadata>
        <dc:title>Holy Bible</dc:title>
        <dc:creator opf:role="aut" 
          opf:file-as="Version, King James">King James Version</dc:creator>
        <dc:identifier opf:scheme="calibre" 
          id="calibre_id">95e823ba-8f88-4c44-9f9d-b22ff04d5358</dc:identifier>
        <dc:language>UND</dc:language>
    </metadata>

with the exception that I want to remove calibre, and perhaps find out if the dc:identifier is needed or can be removed too;or would I have trimmed the file too much; or perhaps can I trim even more data from the heading/header (or whatever you may call it)?

And second file:

Code:

<ncx version="1"
xml:lang="en">
  <head>
    content="95e823ba-8f88-4c44-9f9d-b22ff04d5358" />
  </head>
  <docTitle>
    <text>Table of Contents</text>
  </docTitle>
  <navMap>
    <navPoint id="d362620c-c3f8-45e2-8e63-2a62a2757f81"
    playOrder="1">
      <navLabel>
        <text>The Holy Bible</text>
      </navLabel>

I wanted to trim like this, but perhaps remove ncx version, and any hexadecimal string I find in the document (like the navpoint id, and content)?
Is that possible, or do I need the navPoint id, and string mentioned in an epub?

I'm not interested in copyright issues, as the file is out of copy right, and I'll also be developing my own versions (for personal use).

DiapDealer · 04-15-2012, 10:07 AM

I'm not exactly sure why you think it's necessary to "trim" things that you may not understand, but I can assure you that removing the namespace declaration(s) for an NCX or OPF file will definitely render them useless to most reading systems. The URIs in those XML namespace declarations are not "accessing the internet," and you can't just remove them simply because your epub won't ever "access the internet." That's not what they're for.

Also the NavPoint ID is required by spec... and removing the <content /> tags would make the point of creating an NCX file in the first place, rather moot. No href = no functionality.

Why the "Trimming" obsession, anyway? I'm genuinely curious. These two files don't really have any "fat" in them to begin with (with the exception of maybe a few items between the <metadata></metadata> and <guide></guide> tags of the OPF). The rest is really quite critical—unless you want to eliminate the NCX completely. There's no requirement that you have to have an NCX, but removing it will probably require an alteration to your OPF file... if you're copying them from preexisting, functional ePubs.

Quote:

I've noticed there's a lot of code that might be useless in the metadata.opf and toc.ncx file

Using this assumption as the basis for your learning/experimentation is probably not the best idea. Wouldn't it be better to assume that portions of code are required—until you learn otherwise?

OPF specs
NCX specs

ProDigit · 04-15-2012, 12:09 PM

I find that an epub should work perfectly without these lines of code.
HTML can, why MUST epub have an author or a title?
What if the Author does not want to put his name there?

About trimming, I can do in HTML what I can do in epub,only 3 times smaller.
I find a lot of the coding of epubs inefficient. Repeating twice the same thing, needing lengthy tags to define something, and especially the hex strings on the navpoint ID's I find useless.
The only purpose it serves is have nice hyperlinks and background scanning, and database sorting capabilities. It serves a purpose somewhere I suppose, but I don't think it should automatically be a necessary part of the code.
It reminds me of Windows Vista, compared to Windows XP/98. Windows 98/XP just do what need to be done. Windows Vista is a memory and power hog, that consumes power, and does unnecessary things in the background to optimize time and reduce latency, to compensate for the time it loses doing those unnecessary thins in the first place.

If I can do something simple in HTML, why make it complex in epub?
Why not make ePub compatible with HTML, and leave it simple where it needs to be simple?

Like, this would be a nice toc for me:

Code:

<title>Table of contents</title>
[1:"Chapter 1"]link/to/1st.document[/1]
[2:"Chapter 2"]link/to/2nd.document[/2]
[3:"Chapter 3"]link/to/3rd.document[/3]
[4:"Chapter 4"]link/to/4th.document[/4]
[5:"Chapter 5"]link/to/5th.document[/5]

Would be an example of a very efficient code that automatically assumes first document is first to be displayed in the book, is called "chapter 1",and knows the location of that chapter.

now compare that to this:

Code:

<ncx version="1"
xml:lang="en">
  <head>
    content="95e823ba-8f88-4c44-9f9d-b22ff04d5358" />
  </head>
  <docTitle>
    <text>Table of Contents</text>
  </docTitle>
  <navMap>
    <navPoint id="d362620c-c3f8-45e2-8e63-2a62a2757f81"
    playOrder="1">
      <navLabel>
        <text>Book title</text>
      </navLabel>
      <content src="content/CompleteA_revised_split_0.html" />
    </navPoint>
    <navPoint id="5809ab0e-a3b1-446b-b4d8-ad487a1e546b"
    playOrder="2">
      <navLabel>
        <text>Chapter</text>
      </navLabel>
      <content src="content/CompleteA_revised_split_2.html" />
      <navPoint id="1c4e5abf-96dd-42a3-9604-0936f9c535e0"
      playOrder="3">
        <navLabel>
          <text>Chapter 1</text>
        </navLabel>
        <content src="content/CompleteA_revised_split_2.html" />
      </navPoint>
      <navPoint id="4e14c23c-a836-414f-850f-ce1484f98b4a"
      playOrder="4">
        <navLabel>
          <text>Chapter 2</text>
        </navLabel>
        <content src="content/CompleteA_revised_split_3.html" />
      </navPoint>
      <navPoint id="bd77b9a0-c3f9-400e-b36c-290f896ac923"
      playOrder="5">
        <navLabel>
          <text>Chapter 3</text>
        </navLabel>
        <content src="content/CompleteA_revised_split_4.html" />
      </navPoint>

Looking at the very basics, it's saying the same thing;and in an ebook reader both can be showing exactly the same on the screen; namely, that I want it to display a toc directing to the first 5 chapters; and use that toc to play back chapter 5 after 4 after 3 after 2 after 1, after the toc. But see the amount of code that's been implemented to reach to such result in current version epub!

Trimming code may not make a lot of sense for regular books, but it does for bibles, and dictionaries,and encyclopedia's with tons of chapters, pages, and reference notes.

DiapDealer · 04-15-2012, 12:40 PM

Quote:

I find that an epub should work perfectly without these lines of code.

I frankly don't know how to respond to that. Or your entire post. That's imbecilic. I thought you had a real interest in learning how ePubs are structured, but I can see your experiment is simply an excuse to wax philosophical about what you think ePub should be. Carry on.

ProDigit · 04-15-2012, 12:56 PM

I am, just as I could call you 'imbecilic' too, like you so generously spread around; but I'll refrain myself from using those words!!

But I'm just saying that it makes no sense to make things complicated when ithey could have invented a very good and optimized code formatting, especially if it's for mobile devices where every code line just consumes unnecessary CPU!

And aside from that; I'm still interested in what lines one can safely remove without breaking the epub, meaning I don't really care of not having an epub with all bells and whistles, since I am mainly going to use the epubs in hardware Ebook readers instead of on a pc which supports external links and all other advanced stuff like library organizations etc...

on my ebook reader I open books from file structure, not by author.

And I'll repeat:
What is the use of including external http links when the device can't connect to the internet anyway? Unless these lines are purely informative, there's no reason to keep them in the book, and certainly should not be made a requirement for ebooks.

ProDigit · 04-15-2012, 01:01 PM

Quote:

Originally Posted by DiapDealer

Using this assumption as the basis for your learning/experimentation is probably not the best idea.
specs[/URL]

My 'assumption' is just a voice to be heard, an opinion, not an assumption for learning, as you say it is. An opinion which must be expressed; for if not, then someone else will, or if not, then pretty soon we'll end up with ebooks containing 80% code, and 20% book, while I am able to keep 1-5% code, and 95-99% book in HTML.

Quote:

Wouldn't it be better to assume that portions of code are required—until you learn otherwise?

Portions of the code are required indeed, but definitely not all the code!
And part of this thread is to figure out what is, and what is not.

Just like I've already stopped asking out about the way the toc handles the chapters, because that's just a way epubs operate, I am merely challenging some of the header code to see if one can't do without.

DiapDealer · 04-15-2012, 01:04 PM

Quote:

And I'll repeat:
What is the use of including external http links when the device can't connect to the internet anyway?

And I'll repeat, just this once: XML namespace declarations are NOT external http links. No matter how much they might look like they are. If you remove them you've broken the OPF file and the NCX file. Period.

ePub is not HTML. It never will be. Wishing it were, won't help you in the least. Now I'm really done. I wish you luck in whatever it is you think you're trying to accomplish.

JSWolf · 04-15-2012, 01:14 PM

ProDigit, what you are trying to do is find shortcuts to do what you want to do when those shortcuts won't work because what you want to remove from toc.ncx are things that HAVE to be there. Why not just leave them be since they HAVE to be there?

Also, why are you so concerned with saving a few bytes here and there? Will any of your readers fail to open the ePub? Nope, they won't. Also, making an internal (page with a lit of links) is not more efficient then a properly made toc.ncx even if the internal ToC is smaller in size.

One way to optimize things (that works) is to remove all the tabs/spaces in front of all the lines.

So instead of having this

Code:

<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1"
xml:lang="en">
  <head>
    <meta name="dtb:uid"
    content="95e823ba-8f88-4c44-9f9d-b22ff04d5358" />
    <meta name="dtb:depth" content="3" />
    <meta name="dtb:generator" content="calibre" />
    <meta name="dtb:totalPageCount" content="0" />
    <meta name="dtb:maxPageNumber" content="0" />
  </head>
  <docTitle>
    <text>Table of Contents</text>
  </docTitle>
  <navMap>
    <navPoint id="d362620c-c3f8-45e2-8e63-2a62a2757f81"
    playOrder="1">
      <navLabel>
        <text>The Holy Bible</text>
      </navLabel>

you have this

Code:

<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1"
xml:lang="en">
<head>
<meta name="dtb:uid"
content="95e823ba-8f88-4c44-9f9d-b22ff04d5358" />
<meta name="dtb:depth" content="3" />
<meta name="dtb:generator" content="calibre" />
<meta name="dtb:totalPageCount" content="0" />
<meta name="dtb:maxPageNumber" content="0" />
</head>
<docTitle>
<text>Table of Contents</text>
</docTitle>
<navMap>
<navPoint id="d362620c-c3f8-45e2-8e63-2a62a2757f81" playOrder="1">
<navLabel>
<text>The Holy Bible</text>
</navLabel>

You could also shorten the id as long as the id is unique.

Elfwreck · 04-15-2012, 01:21 PM

Quote:

Originally Posted by ProDigit

I find that an epub should work perfectly without these lines of code.
HTML can, why MUST epub have an author or a title?
What if the Author does not want to put his name there?

It needs a title; it doesn't need an author.

The author can use "Unknown" or "Redacted" or "anonymous" or "decline to state" or a pseudonym of choice--or not put in an author. But the book needs a title so that the software can identify it.

It needs a title in the same way that a digital file needs a name. HTML doesn't, because it's not designed to be read by software that uses "title" to list & sort files.

The Epub format construction guide goes into detail about what is and is not required.

Quote:

About trimming, I can do in HTML what I can do in epub,only 3 times smaller.

Then why aren't you just making HTML files and releasing those? ("Because most ebook readers won't read HTML, and when they do, they don't support all the features I want to include." Which, um, leaves you with "use a format designed to work for most books," which includes features that aren't directly important to your book.)

Epubs were designed to include both metadata and navigation options that HTML has problems with, and to run on remote devices with strict memory limitations.

Quote:

The only purpose it serves is have nice hyperlinks and background scanning, and database sorting capabilities. It serves a purpose somewhere I suppose, but I don't think it should automatically be a necessary part of the code.

Hyperlinks, background scanning, and database sorting options are not bloatware options in ebooks.

Quote:

If I can do something simple in HTML, why make it complex in epub?
Why not make ePub compatible with HTML, and leave it simple where it needs to be simple?

Because it needs to support options that aren't important to you.

Quote:

Like, this would be a nice toc for me:

Code:

<title>Table of contents</title>
[1:"Chapter 1"]link/to/1st.document[/1]
[2:"Chapter 2"]link/to/2nd.document[/2]
[3:"Chapter 3"]link/to/3rd.document[/3]
[4:"Chapter 4"]link/to/4th.document[/4]
[5:"Chapter 5"]link/to/5th.document[/5]

Would be an example of a very efficient code that automatically assumes first document is first to be displayed in the book, is called "chapter 1",and knows the location of that chapter.

That's basically what it is. It takes more characters to describe because it supports options you don't care about--subset categories, links within documents, readable metadata.

Quote:

now compare that to this:
[code]<ncx version="1"
xml:lang="en">
<head>
content="95e823ba-8f88-4c44-9f9d-b22ff04d5358" />
</head>

That, you can change. Pick a unique DocID system of your own. You don't have to use the random string of numbers and letters. Many people use ISBNs for the DocID.

Quote:

<docTitle>
<text>Table of Contents</text>
</docTitle>
<navMap>
<navPoint id="d362620c-c3f8-45e2-8e63-2a62a2757f81"
playOrder="1">
<navLabel>
<text>Book title</text>
</navLabel>
<content src="content/CompleteA_revised_split_0.html" />
</navPoint>

<navPoint id="5809ab0e-a3b1-446b-b4d8-ad487a1e546b"
playOrder="2">
<navLabel>
<text>Chapter</text>
</navLabel>[/quote]

You don't need the long random-number strings. You can rename the ID points to make sense & be easy to follow:

Code:

<docTitle>
    <text>Deliver Us</text>
  </docTitle>
  <navMap>
    <navPoint id="navPoint-1" playOrder="1">
      <navLabel>
        <text>Deliver Us</text>
      </navLabel>
      <content src="Text/03Titlepage.xhtml"/>
    </navPoint>
    <navPoint id="navPoint-2" playOrder="2">
      <navLabel>
        <text>Disclaimer</text>
      </navLabel>
      <content src="Text/05Disclaimer.xhtml"/>
    </navPoint>

...and so on.

Quote:

Looking at the very basics, it's saying the same thing;and in an ebook reader both can be showing exactly the same on the screen; namely, that I want it to display a toc directing to the first 5 chapters; and use that toc to play back chapter 5 after 4 after 3 after 2 after 1, after the toc. But see the amount of code that's been implemented to reach to such result in current version epub!

Because not everyone builds them the same way you have. Saying "it should be simpler because I don't care about the other options" is pointless. A person might want multiple navLabel sections inside a navPoint, which you can't do with a single-level "ToC Label: URL" arrangement.

Quote:

Trimming code may not make a lot of sense for regular books, but it does for bibles, and dictionaries,and encyclopedia's with tons of chapters, pages, and reference notes.

Very long and complex books have always been problematic for publishing. This is not a sign that the publishing standards are flawed; it's a sign that you're working on a book that pushes the limits of the format.

Bibles are often printed in small type on very thin paper because if they were printed on normal paper, they'd be several volumes long. Trying to cram a small-encyclopedia-length work into a single volume is going to be troublesome.

If it helps, there's a build-epub-from-scratch tutorial at the Spontaneous Derivation wiki. It goes through the bare minimum requirements for an epub's structure.

ProDigit · 04-15-2012, 01:26 PM

Quote:

Originally Posted by JSWolf

One way to optimize things (that works) is to remove all the tabs/spaces in front of all the lines.

You could also shorten the id as long as the id is unique.

Yeah, I was planning on doing that, but I want to know WHY it is that I can't remove any of the header things (other than 'just because I say so', or 'because it's written here somewhere.')?
Wouldn't it be nice that you know which lines are mandatory, and which aren't?

Sometimes they say all lines are, but if you can figure most devices can read it even without those lines it's valuable information.

Just like I can display an HTML in most browsers, even without head or data like beneath here.In fact, in most my HTML's I just remove this info, work without (or minimal) class, and css and all that... I only use it if my code benefits from it (if I save writing code, without adding too much complexity).

Example of some lousy HTML code that can easily be removed from the HTML, and with a small rewrite of the HTML codes within the body, will not significantly reduce style of the HTML:

Code:

<?xml version="1.0" encoding="ISO-8859-1"?> 
<!DOCTYPE html PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.2 Document//EN" "http://openebook.org/dtds/oeb-1.2/oebdoc12.dtd"> 
<html xml:lang="en-us" xmlns="http://www.w3.org/1999/xhtml"> 
<head><meta http-equiv="Content-Type" content="text/html;" /> 
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 
<title>  - Intelligent Design by Sharon Lee and Steve Miller</title> 
<meta name="Publisher" content="Baen Books" /> 
<meta name="Copyright" content="2011 by Patrick Lundrigan, Larry Correia, Travis S. Taylor, Robert Buettner, Sharon Lee & Steve Miller" /> 
<meta name="Author" content="Patrick Lundrigan, Larry Correia, Travis S. Taylor, Robert Buettner" /> 
<style type="text/css"> 
p {text-indent:2em;margin-top:0;margin-bottom:2px} 
h1 {page-break-before:left} 
p.chapter {margin: 135.0px 0.0px 30.0px 0.0px; line-height: 24.1px; font: 28.0px 'Times New Roman'; color: #2e2829; font-weight:bold; text-align:center;} 
p.p4 {margin: 0.0px 0.0px 0.0px 0.0px; text-align: center} 
span.s1 {font-style: italic} 
span.s2 {text-decoration: underline} 
</style> 
<script type="text/javascript" language="javascript"><!-- 
function setStyle() 
{ 
if (parent.control) if (parent.control.mainLoad) parent.control.mainLoad(document); 
if (window.focus) window.focus(); 
} 
function PNo(PgNo) 
{ 
if (parent.control) if (parent.control.SetPage) parent.control.document.forms[0].PageNo.value = PgNo; 
} 
setStyle(); 
//--></script> 
</head>

I think the search for smaller,cleaner code is not a bad one,and often overlooked.
Smaller code is the difference between the now near to extinct excite,altavista and yahoo search engines; vs google.

JSWolf · 04-15-2012, 01:32 PM

There is nothing wrong with wanting to be as efficiant in your coding as possible. But the things that need to be there need to be there are your wanting to remove them is not going to work.

ProDigit · 04-15-2012, 01:33 PM

Again,
I don't want to remove lines that are NEEDED, but merely question what is really NEEDED, and what is just filler... (the main reason I wrote this thread)

I had actually hoped to save some time, but it seems I would have saved more time doing the testing myself, and then write my conclusions on this website.

JSWolf · 04-15-2012, 01:37 PM

The problem is that some reading systems might ignore the erorrs (missing elements) and appear to work. But, other software could fail. So if you make it such that it's not in spec but works now, you could later on have to go in and fix it for some different/newer reading software. It's not worth it to remove what's needed.

But one thing you can do is use FlightCrew to verify the ePub. It will tell you what is missing that you need.

ProDigit · 04-15-2012, 01:50 PM

Unfortunately I only have 2 remaining devices that read epub. Otherwise I could take my time, do the research, and post the results on this site.
If it turns out that most ebook readers ignore certain errors, without glitches, then it's good news for me.
Many reading devices have the same software or even hardware inside, and should operate in a similar manner.

So far it's not clear how they respond to all these variations. I know from the little time I had playing with epub, that not everything can be removed. I know that reading devices are pretty strict on their code.

And I hope some programmers might read this thread and decide it indeed benefits to not make everything mandatory, but just imply a code within reader software/firmware, that if some specific line of code is not present, a standard pattern will be followed.
I think they should have done that from the start; Like why does every epub needs to have a mimetype file and a container.xml, if for most books these files are identical?

I presume mimetype file is for mac/linux, who reads the first bytes of a file, to determine what program it needs to open the file (probably also why you can't compress that file); but container.xml is different.

Jellby · 04-15-2012, 02:28 PM

As a general answer, most of the things you find unnecessary, maybe are unnecessary. But the ePub format is not something created from scratch just for books, it uses some file formats and conventions that already existed, to make it easier to create reading applications and books from already existing code, to make it easier to parse them (because there are already tools that can deal with those things), etc. And although those pieces may look like garbage to you, they are there to let software know what it's dealing with. The effect they will have in the final filesize will most likely be minimal.

I would advise you to have a look at some of the books I've uploaded here. They're all coded "by hand", and have pretty minimal markup overhead, I believe. Of course, they have some things you think unnecessary, like author, TOC, illustrations or description.

04-15-2012, 12:09 PM	#3
ProDigit Karmaniac Posts: 2,553 Karma: 11499146 Join Date: Oct 2008 Location: Miami FL Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O	I find that an epub should work perfectly without these lines of code. HTML can, why MUST epub have an author or a title? What if the Author does not want to put his name there? About trimming, I can do in HTML what I can do in epub,only 3 times smaller. I find a lot of the coding of epubs inefficient. Repeating twice the same thing, needing lengthy tags to define something, and especially the hex strings on the navpoint ID's I find useless. The only purpose it serves is have nice hyperlinks and background scanning, and database sorting capabilities. It serves a purpose somewhere I suppose, but I don't think it should automatically be a necessary part of the code. It reminds me of Windows Vista, compared to Windows XP/98. Windows 98/XP just do what need to be done. Windows Vista is a memory and power hog, that consumes power, and does unnecessary things in the background to optimize time and reduce latency, to compensate for the time it loses doing those unnecessary thins in the first place. If I can do something simple in HTML, why make it complex in epub? Why not make ePub compatible with HTML, and leave it simple where it needs to be simple? Like, this would be a nice toc for me: Code: <title>Table of contents</title> [1:"Chapter 1"]link/to/1st.document[/1] [2:"Chapter 2"]link/to/2nd.document[/2] [3:"Chapter 3"]link/to/3rd.document[/3] [4:"Chapter 4"]link/to/4th.document[/4] [5:"Chapter 5"]link/to/5th.document[/5] Would be an example of a very efficient code that automatically assumes first document is first to be displayed in the book, is called "chapter 1",and knows the location of that chapter. now compare that to this: Code: <ncx version="1" xml:lang="en"> <head> content="95e823ba-8f88-4c44-9f9d-b22ff04d5358" /> </head> <docTitle> <text>Table of Contents</text> </docTitle> <navMap> <navPoint id="d362620c-c3f8-45e2-8e63-2a62a2757f81" playOrder="1"> <navLabel> <text>Book title</text> </navLabel> <content src="content/CompleteA_revised_split_0.html" /> </navPoint> <navPoint id="5809ab0e-a3b1-446b-b4d8-ad487a1e546b" playOrder="2"> <navLabel> <text>Chapter</text> </navLabel> <content src="content/CompleteA_revised_split_2.html" /> <navPoint id="1c4e5abf-96dd-42a3-9604-0936f9c535e0" playOrder="3"> <navLabel> <text>Chapter 1</text> </navLabel> <content src="content/CompleteA_revised_split_2.html" /> </navPoint> <navPoint id="4e14c23c-a836-414f-850f-ce1484f98b4a" playOrder="4"> <navLabel> <text>Chapter 2</text> </navLabel> <content src="content/CompleteA_revised_split_3.html" /> </navPoint> <navPoint id="bd77b9a0-c3f9-400e-b36c-290f896ac923" playOrder="5"> <navLabel> <text>Chapter 3</text> </navLabel> <content src="content/CompleteA_revised_split_4.html" /> </navPoint> Looking at the very basics, it's saying the same thing;and in an ebook reader both can be showing exactly the same on the screen; namely, that I want it to display a toc directing to the first 5 chapters; and use that toc to play back chapter 5 after 4 after 3 after 2 after 1, after the toc. But see the amount of code that's been implemented to reach to such result in current version epub! Trimming code may not make a lot of sense for regular books, but it does for bibles, and dictionaries,and encyclopedia's with tons of chapters, pages, and reference notes. Last edited by ProDigit; 04-15-2012 at 12:14 PM.

04-15-2012, 12:56 PM	#5
ProDigit Karmaniac Posts: 2,553 Karma: 11499146 Join Date: Oct 2008 Location: Miami FL Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O	I am, just as I could call you 'imbecilic' too, like you so generously spread around; but I'll refrain myself from using those words!! But I'm just saying that it makes no sense to make things complicated when ithey could have invented a very good and optimized code formatting, especially if it's for mobile devices where every code line just consumes unnecessary CPU! And aside from that; I'm still interested in what lines one can safely remove without breaking the epub, meaning I don't really care of not having an epub with all bells and whistles, since I am mainly going to use the epubs in hardware Ebook readers instead of on a pc which supports external links and all other advanced stuff like library organizations etc... on my ebook reader I open books from file structure, not by author. And I'll repeat: What is the use of including external http links when the device can't connect to the internet anyway? Unless these lines are purely informative, there's no reason to keep them in the book, and certainly should not be made a requirement for ebooks. Last edited by ProDigit; 04-15-2012 at 01:04 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[Old Thread] calibre not creating content.opf or toc.ncx files during conversion	foxxywith2xs	Calibre	7	12-16-2012 07:49 PM
NCX file generator (and html ToC and opf)	GiorgioC	Workshop	0	07-12-2011 06:55 AM
Use Regex to Code an Inline TOC, from an External TOC's .ncx File	mostlynovels	ePub	2	03-16-2011 12:15 PM
Saving with old toc.ncx file	Haderlump	Sigil	1	12-28-2010 12:11 PM
Compiling HTML,NCX and OPF file	pakiyabhai	Calibre	8	12-25-2009 11:12 AM

04-15-2012, 01:32 PM	#11
JSWolf Resident Curmudgeon Posts: 74,483 Karma: 129668758 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3	There is nothing wrong with wanting to be as efficiant in your coding as possible. But the things that need to be there need to be there are your wanting to remove them is not going to work.

04-15-2012, 01:33 PM	#12
ProDigit Karmaniac Posts: 2,553 Karma: 11499146 Join Date: Oct 2008 Location: Miami FL Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O	Again, I don't want to remove lines that are NEEDED, but merely question what is really NEEDED, and what is just filler... (the main reason I wrote this thread) I had actually hoped to save some time, but it seems I would have saved more time doing the testing myself, and then write my conclusions on this website.

04-15-2012, 01:37 PM	#13
JSWolf Resident Curmudgeon Posts: 74,483 Karma: 129668758 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3	The problem is that some reading systems might ignore the erorrs (missing elements) and appear to work. But, other software could fail. So if you make it such that it's not in spec but works now, you could later on have to go in and fix it for some different/newer reading software. It's not worth it to remove what's needed. But one thing you can do is use FlightCrew to verify the ePub. It will tell you what is missing that you need.

04-15-2012, 01:50 PM	#14
ProDigit Karmaniac Posts: 2,553 Karma: 11499146 Join Date: Oct 2008 Location: Miami FL Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O	Unfortunately I only have 2 remaining devices that read epub. Otherwise I could take my time, do the research, and post the results on this site. If it turns out that most ebook readers ignore certain errors, without glitches, then it's good news for me. Many reading devices have the same software or even hardware inside, and should operate in a similar manner. So far it's not clear how they respond to all these variations. I know from the little time I had playing with epub, that not everything can be removed. I know that reading devices are pretty strict on their code. And I hope some programmers might read this thread and decide it indeed benefits to not make everything mandatory, but just imply a code within reader software/firmware, that if some specific line of code is not present, a standard pattern will be followed. I think they should have done that from the start; Like why does every epub needs to have a mimetype file and a container.xml, if for most books these files are identical? I presume mimetype file is for mac/linux, who reads the first bytes of a file, to determine what program it needs to open the file (probably also why you can't compress that file); but container.xml is different.

04-15-2012, 02:28 PM	#15
Jellby frumious Bandersnatch Posts: 7,516 Karma: 19000001 Join Date: Jan 2008 Location: Spaniard in Sweden Device: Cybook Orizon, Kobo Aura	As a general answer, most of the things you find unnecessary, maybe are unnecessary. But the ePub format is not something created from scratch just for books, it uses some file formats and conventions that already existed, to make it easier to create reading applications and books from already existing code, to make it easier to parse them (because there are already tools that can deal with those things), etc. And although those pieces may look like garbage to you, they are there to let software know what it's dealing with. The effect they will have in the final filesize will most likely be minimal. I would advise you to have a look at some of the books I've uploaded here. They're all coded "by hand", and have pretty minimal markup overhead, I believe. Of course, they have some things you think unnecessary, like author, TOC, illustrations or description.

Advert

Advert