MobileRead Forums - View Single Post - Manually trimming the metadata.opf and toc.ncx file

ProDigit · 04-15-2012, 10:10 AM

So, instead of totally starting from scratch, I thought it might do me good to use an existing epub as template and edit it's parameters to fit my needs for an epub.

I've noticed there's a lot of code that might be useless in the metadata.opf and toc.ncx file, especially for when I create an ebook to be read on an ebook reader that has no access to the internet.

The files I have contain the following data (which I think I can trim somewhat):

metadata.opf:

Code:

<?xml version="1.0"  encoding="UTF-8"?>
<package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="calibre_id">
    <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf"
         xmlns:calibre="http://calibre.kovidgoyal.net/2009/metadata">
        <dc:title>Holy Bible</dc:title>
        <dc:creator opf:role="aut" opf:file-as="Version, King James">King James Version</dc:creator>
        <dc:contributor opf:role="bkp" opf:file-as="calibre">calibre (0.5.14) 
         [http://calibre.kovidgoyal.net]</dc:contributor>
        <dc:identifier opf:scheme="calibre" id="calibre_id">
          95e823ba-8f88-4c44-9f9d-b22ff04d5358</dc:identifier>
        <dc:date>2009-06-16T04:03:49</dc:date>
        <dc:language>UND</dc:language>
        <meta name="calibre:series_index" content="1"/>
        <meta name="calibre:rating" content="0"/>
    </metadata>

After this the file mainly links to internal files and id refs (aka manifest).

the second file,
toc.ncx

Code:

<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1"
xml:lang="en">
  <head>
    <meta name="dtb:uid"
    content="95e823ba-8f88-4c44-9f9d-b22ff04d5358" />
    <meta name="dtb:depth" content="3" />
    <meta name="dtb:generator" content="calibre" />
    <meta name="dtb:totalPageCount" content="0" />
    <meta name="dtb:maxPageNumber" content="0" />
  </head>
  <docTitle>
    <text>Table of Contents</text>
  </docTitle>
  <navMap>
    <navPoint id="d362620c-c3f8-45e2-8e63-2a62a2757f81"
    playOrder="1">
      <navLabel>
        <text>The Holy Bible</text>
      </navLabel>

I've included the first page that it displays too, because there's something I yet don't understand; namely the 'navPoint id' with string.

My question is, can I trim the first file to look like this:

Code:

<?xml version="1.0"  encoding="UTF-8"?>
<package version="2.0">
    <metadata>
        <dc:title>Holy Bible</dc:title>
        <dc:creator opf:role="aut" 
          opf:file-as="Version, King James">King James Version</dc:creator>
        <dc:identifier opf:scheme="calibre" 
          id="calibre_id">95e823ba-8f88-4c44-9f9d-b22ff04d5358</dc:identifier>
        <dc:language>UND</dc:language>
    </metadata>

with the exception that I want to remove calibre, and perhaps find out if the dc:identifier is needed or can be removed too;or would I have trimmed the file too much; or perhaps can I trim even more data from the heading/header (or whatever you may call it)?

And second file:

Code:

<ncx version="1"
xml:lang="en">
  <head>
    content="95e823ba-8f88-4c44-9f9d-b22ff04d5358" />
  </head>
  <docTitle>
    <text>Table of Contents</text>
  </docTitle>
  <navMap>
    <navPoint id="d362620c-c3f8-45e2-8e63-2a62a2757f81"
    playOrder="1">
      <navLabel>
        <text>The Holy Bible</text>
      </navLabel>

I wanted to trim like this, but perhaps remove ncx version, and any hexadecimal string I find in the document (like the navpoint id, and content)?
Is that possible, or do I need the navPoint id, and string mentioned in an epub?

I'm not interested in copyright issues, as the file is out of copy right, and I'll also be developing my own versions (for personal use).