04-15-2012, 09:10 AM | #1 |
Karmaniac
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
|
Manually trimming the metadata.opf and toc.ncx file
So, instead of totally starting from scratch, I thought it might do me good to use an existing epub as template and edit it's parameters to fit my needs for an epub.
I've noticed there's a lot of code that might be useless in the metadata.opf and toc.ncx file, especially for when I create an ebook to be read on an ebook reader that has no access to the internet. The files I have contain the following data (which I think I can trim somewhat): metadata.opf: Code:
<?xml version="1.0" encoding="UTF-8"?> <package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="calibre_id"> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf" xmlns:calibre="http://calibre.kovidgoyal.net/2009/metadata"> <dc:title>Holy Bible</dc:title> <dc:creator opf:role="aut" opf:file-as="Version, King James">King James Version</dc:creator> <dc:contributor opf:role="bkp" opf:file-as="calibre">calibre (0.5.14) [http://calibre.kovidgoyal.net]</dc:contributor> <dc:identifier opf:scheme="calibre" id="calibre_id"> 95e823ba-8f88-4c44-9f9d-b22ff04d5358</dc:identifier> <dc:date>2009-06-16T04:03:49</dc:date> <dc:language>UND</dc:language> <meta name="calibre:series_index" content="1"/> <meta name="calibre:rating" content="0"/> </metadata> the second file, toc.ncx Code:
<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1" xml:lang="en"> <head> <meta name="dtb:uid" content="95e823ba-8f88-4c44-9f9d-b22ff04d5358" /> <meta name="dtb:depth" content="3" /> <meta name="dtb:generator" content="calibre" /> <meta name="dtb:totalPageCount" content="0" /> <meta name="dtb:maxPageNumber" content="0" /> </head> <docTitle> <text>Table of Contents</text> </docTitle> <navMap> <navPoint id="d362620c-c3f8-45e2-8e63-2a62a2757f81" playOrder="1"> <navLabel> <text>The Holy Bible</text> </navLabel> My question is, can I trim the first file to look like this: Code:
<?xml version="1.0" encoding="UTF-8"?> <package version="2.0"> <metadata> <dc:title>Holy Bible</dc:title> <dc:creator opf:role="aut" opf:file-as="Version, King James">King James Version</dc:creator> <dc:identifier opf:scheme="calibre" id="calibre_id">95e823ba-8f88-4c44-9f9d-b22ff04d5358</dc:identifier> <dc:language>UND</dc:language> </metadata> And second file: Code:
<ncx version="1" xml:lang="en"> <head> content="95e823ba-8f88-4c44-9f9d-b22ff04d5358" /> </head> <docTitle> <text>Table of Contents</text> </docTitle> <navMap> <navPoint id="d362620c-c3f8-45e2-8e63-2a62a2757f81" playOrder="1"> <navLabel> <text>The Holy Bible</text> </navLabel> Is that possible, or do I need the navPoint id, and string mentioned in an epub? I'm not interested in copyright issues, as the file is out of copy right, and I'll also be developing my own versions (for personal use). Last edited by ProDigit; 04-15-2012 at 09:13 AM. |
04-15-2012, 10:07 AM | #2 | |
Grand Sorcerer
Posts: 27,586
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I'm not exactly sure why you think it's necessary to "trim" things that you may not understand, but I can assure you that removing the namespace declaration(s) for an NCX or OPF file will definitely render them useless to most reading systems. The URIs in those XML namespace declarations are not "accessing the internet," and you can't just remove them simply because your epub won't ever "access the internet." That's not what they're for.
Also the NavPoint ID is required by spec... and removing the <content /> tags would make the point of creating an NCX file in the first place, rather moot. No href = no functionality. Why the "Trimming" obsession, anyway? I'm genuinely curious. These two files don't really have any "fat" in them to begin with (with the exception of maybe a few items between the <metadata></metadata> and <guide></guide> tags of the OPF). The rest is really quite critical—unless you want to eliminate the NCX completely. There's no requirement that you have to have an NCX, but removing it will probably require an alteration to your OPF file... if you're copying them from preexisting, functional ePubs. Quote:
OPF specs NCX specs Last edited by DiapDealer; 04-15-2012 at 11:38 AM. |
|
Advert | |
|
04-15-2012, 12:09 PM | #3 |
Karmaniac
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
|
I find that an epub should work perfectly without these lines of code.
HTML can, why MUST epub have an author or a title? What if the Author does not want to put his name there? About trimming, I can do in HTML what I can do in epub,only 3 times smaller. I find a lot of the coding of epubs inefficient. Repeating twice the same thing, needing lengthy tags to define something, and especially the hex strings on the navpoint ID's I find useless. The only purpose it serves is have nice hyperlinks and background scanning, and database sorting capabilities. It serves a purpose somewhere I suppose, but I don't think it should automatically be a necessary part of the code. It reminds me of Windows Vista, compared to Windows XP/98. Windows 98/XP just do what need to be done. Windows Vista is a memory and power hog, that consumes power, and does unnecessary things in the background to optimize time and reduce latency, to compensate for the time it loses doing those unnecessary thins in the first place. If I can do something simple in HTML, why make it complex in epub? Why not make ePub compatible with HTML, and leave it simple where it needs to be simple? Like, this would be a nice toc for me: Code:
<title>Table of contents</title> [1:"Chapter 1"]link/to/1st.document[/1] [2:"Chapter 2"]link/to/2nd.document[/2] [3:"Chapter 3"]link/to/3rd.document[/3] [4:"Chapter 4"]link/to/4th.document[/4] [5:"Chapter 5"]link/to/5th.document[/5] now compare that to this: Code:
<ncx version="1" xml:lang="en"> <head> content="95e823ba-8f88-4c44-9f9d-b22ff04d5358" /> </head> <docTitle> <text>Table of Contents</text> </docTitle> <navMap> <navPoint id="d362620c-c3f8-45e2-8e63-2a62a2757f81" playOrder="1"> <navLabel> <text>Book title</text> </navLabel> <content src="content/CompleteA_revised_split_0.html" /> </navPoint> <navPoint id="5809ab0e-a3b1-446b-b4d8-ad487a1e546b" playOrder="2"> <navLabel> <text>Chapter</text> </navLabel> <content src="content/CompleteA_revised_split_2.html" /> <navPoint id="1c4e5abf-96dd-42a3-9604-0936f9c535e0" playOrder="3"> <navLabel> <text>Chapter 1</text> </navLabel> <content src="content/CompleteA_revised_split_2.html" /> </navPoint> <navPoint id="4e14c23c-a836-414f-850f-ce1484f98b4a" playOrder="4"> <navLabel> <text>Chapter 2</text> </navLabel> <content src="content/CompleteA_revised_split_3.html" /> </navPoint> <navPoint id="bd77b9a0-c3f9-400e-b36c-290f896ac923" playOrder="5"> <navLabel> <text>Chapter 3</text> </navLabel> <content src="content/CompleteA_revised_split_4.html" /> </navPoint> Trimming code may not make a lot of sense for regular books, but it does for bibles, and dictionaries,and encyclopedia's with tons of chapters, pages, and reference notes. Last edited by ProDigit; 04-15-2012 at 12:14 PM. |
04-15-2012, 12:40 PM | #4 | |
Grand Sorcerer
Posts: 27,586
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
Last edited by DiapDealer; 04-15-2012 at 12:47 PM. |
|
04-15-2012, 12:56 PM | #5 |
Karmaniac
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
|
I am, just as I could call you 'imbecilic' too, like you so generously spread around; but I'll refrain myself from using those words!!
But I'm just saying that it makes no sense to make things complicated when ithey could have invented a very good and optimized code formatting, especially if it's for mobile devices where every code line just consumes unnecessary CPU! And aside from that; I'm still interested in what lines one can safely remove without breaking the epub, meaning I don't really care of not having an epub with all bells and whistles, since I am mainly going to use the epubs in hardware Ebook readers instead of on a pc which supports external links and all other advanced stuff like library organizations etc... on my ebook reader I open books from file structure, not by author. And I'll repeat: What is the use of including external http links when the device can't connect to the internet anyway? Unless these lines are purely informative, there's no reason to keep them in the book, and certainly should not be made a requirement for ebooks. Last edited by ProDigit; 04-15-2012 at 01:04 PM. |
Advert | |
|
04-15-2012, 01:01 PM | #6 | ||
Karmaniac
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
|
Quote:
Quote:
And part of this thread is to figure out what is, and what is not. Just like I've already stopped asking out about the way the toc handles the chapters, because that's just a way epubs operate, I am merely challenging some of the header code to see if one can't do without. Last edited by ProDigit; 04-15-2012 at 01:04 PM. |
||
04-15-2012, 01:04 PM | #7 | |
Grand Sorcerer
Posts: 27,586
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
ePub is not HTML. It never will be. Wishing it were, won't help you in the least. Now I'm really done. I wish you luck in whatever it is you think you're trying to accomplish. |
|
04-15-2012, 01:14 PM | #8 |
Resident Curmudgeon
Posts: 74,483
Karma: 129668758
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
ProDigit, what you are trying to do is find shortcuts to do what you want to do when those shortcuts won't work because what you want to remove from toc.ncx are things that HAVE to be there. Why not just leave them be since they HAVE to be there?
Also, why are you so concerned with saving a few bytes here and there? Will any of your readers fail to open the ePub? Nope, they won't. Also, making an internal (page with a lit of links) is not more efficient then a properly made toc.ncx even if the internal ToC is smaller in size. One way to optimize things (that works) is to remove all the tabs/spaces in front of all the lines. So instead of having this Code:
<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1" xml:lang="en"> <head> <meta name="dtb:uid" content="95e823ba-8f88-4c44-9f9d-b22ff04d5358" /> <meta name="dtb:depth" content="3" /> <meta name="dtb:generator" content="calibre" /> <meta name="dtb:totalPageCount" content="0" /> <meta name="dtb:maxPageNumber" content="0" /> </head> <docTitle> <text>Table of Contents</text> </docTitle> <navMap> <navPoint id="d362620c-c3f8-45e2-8e63-2a62a2757f81" playOrder="1"> <navLabel> <text>The Holy Bible</text> </navLabel> Code:
<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1" xml:lang="en"> <head> <meta name="dtb:uid" content="95e823ba-8f88-4c44-9f9d-b22ff04d5358" /> <meta name="dtb:depth" content="3" /> <meta name="dtb:generator" content="calibre" /> <meta name="dtb:totalPageCount" content="0" /> <meta name="dtb:maxPageNumber" content="0" /> </head> <docTitle> <text>Table of Contents</text> </docTitle> <navMap> <navPoint id="d362620c-c3f8-45e2-8e63-2a62a2757f81" playOrder="1"> <navLabel> <text>The Holy Bible</text> </navLabel> |
04-15-2012, 01:21 PM | #9 | |||||||||
Grand Sorcerer
Posts: 5,185
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
|
Quote:
The author can use "Unknown" or "Redacted" or "anonymous" or "decline to state" or a pseudonym of choice--or not put in an author. But the book needs a title so that the software can identify it. It needs a title in the same way that a digital file needs a name. HTML doesn't, because it's not designed to be read by software that uses "title" to list & sort files. The Epub format construction guide goes into detail about what is and is not required. Quote:
Epubs were designed to include both metadata and navigation options that HTML has problems with, and to run on remote devices with strict memory limitations. Quote:
Quote:
Quote:
Quote:
Quote:
playOrder="2"> <navLabel> <text>Chapter</text> </navLabel>[/quote] You don't need the long random-number strings. You can rename the ID points to make sense & be easy to follow: Code:
<docTitle> <text>Deliver Us</text> </docTitle> <navMap> <navPoint id="navPoint-1" playOrder="1"> <navLabel> <text>Deliver Us</text> </navLabel> <content src="Text/03Titlepage.xhtml"/> </navPoint> <navPoint id="navPoint-2" playOrder="2"> <navLabel> <text>Disclaimer</text> </navLabel> <content src="Text/05Disclaimer.xhtml"/> </navPoint> Quote:
Quote:
Bibles are often printed in small type on very thin paper because if they were printed on normal paper, they'd be several volumes long. Trying to cram a small-encyclopedia-length work into a single volume is going to be troublesome. If it helps, there's a build-epub-from-scratch tutorial at the Spontaneous Derivation wiki. It goes through the bare minimum requirements for an epub's structure. |
|||||||||
04-15-2012, 01:26 PM | #10 | |
Karmaniac
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
|
Quote:
Wouldn't it be nice that you know which lines are mandatory, and which aren't? Sometimes they say all lines are, but if you can figure most devices can read it even without those lines it's valuable information. Just like I can display an HTML in most browsers, even without head or data like beneath here.In fact, in most my HTML's I just remove this info, work without (or minimal) class, and css and all that... I only use it if my code benefits from it (if I save writing code, without adding too much complexity). Example of some lousy HTML code that can easily be removed from the HTML, and with a small rewrite of the HTML codes within the body, will not significantly reduce style of the HTML: Code:
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE html PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.2 Document//EN" "http://openebook.org/dtds/oeb-1.2/oebdoc12.dtd"> <html xml:lang="en-us" xmlns="http://www.w3.org/1999/xhtml"> <head><meta http-equiv="Content-Type" content="text/html;" /> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title> - Intelligent Design by Sharon Lee and Steve Miller</title> <meta name="Publisher" content="Baen Books" /> <meta name="Copyright" content="2011 by Patrick Lundrigan, Larry Correia, Travis S. Taylor, Robert Buettner, Sharon Lee & Steve Miller" /> <meta name="Author" content="Patrick Lundrigan, Larry Correia, Travis S. Taylor, Robert Buettner" /> <style type="text/css"> p {text-indent:2em;margin-top:0;margin-bottom:2px} h1 {page-break-before:left} p.chapter {margin: 135.0px 0.0px 30.0px 0.0px; line-height: 24.1px; font: 28.0px 'Times New Roman'; color: #2e2829; font-weight:bold; text-align:center;} p.p4 {margin: 0.0px 0.0px 0.0px 0.0px; text-align: center} span.s1 {font-style: italic} span.s2 {text-decoration: underline} </style> <script type="text/javascript" language="javascript"><!-- function setStyle() { if (parent.control) if (parent.control.mainLoad) parent.control.mainLoad(document); if (window.focus) window.focus(); } function PNo(PgNo) { if (parent.control) if (parent.control.SetPage) parent.control.document.forms[0].PageNo.value = PgNo; } setStyle(); //--></script> </head> I think the search for smaller,cleaner code is not a bad one,and often overlooked. Smaller code is the difference between the now near to extinct excite,altavista and yahoo search engines; vs google. Last edited by ProDigit; 04-15-2012 at 01:32 PM. |
|
04-15-2012, 01:32 PM | #11 |
Resident Curmudgeon
Posts: 74,483
Karma: 129668758
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
There is nothing wrong with wanting to be as efficiant in your coding as possible. But the things that need to be there need to be there are your wanting to remove them is not going to work.
|
04-15-2012, 01:33 PM | #12 |
Karmaniac
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
|
Again,
I don't want to remove lines that are NEEDED, but merely question what is really NEEDED, and what is just filler... (the main reason I wrote this thread) I had actually hoped to save some time, but it seems I would have saved more time doing the testing myself, and then write my conclusions on this website. |
04-15-2012, 01:37 PM | #13 |
Resident Curmudgeon
Posts: 74,483
Karma: 129668758
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
The problem is that some reading systems might ignore the erorrs (missing elements) and appear to work. But, other software could fail. So if you make it such that it's not in spec but works now, you could later on have to go in and fix it for some different/newer reading software. It's not worth it to remove what's needed.
But one thing you can do is use FlightCrew to verify the ePub. It will tell you what is missing that you need. |
04-15-2012, 01:50 PM | #14 |
Karmaniac
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
|
Unfortunately I only have 2 remaining devices that read epub. Otherwise I could take my time, do the research, and post the results on this site.
If it turns out that most ebook readers ignore certain errors, without glitches, then it's good news for me. Many reading devices have the same software or even hardware inside, and should operate in a similar manner. So far it's not clear how they respond to all these variations. I know from the little time I had playing with epub, that not everything can be removed. I know that reading devices are pretty strict on their code. And I hope some programmers might read this thread and decide it indeed benefits to not make everything mandatory, but just imply a code within reader software/firmware, that if some specific line of code is not present, a standard pattern will be followed. I think they should have done that from the start; Like why does every epub needs to have a mimetype file and a container.xml, if for most books these files are identical? I presume mimetype file is for mac/linux, who reads the first bytes of a file, to determine what program it needs to open the file (probably also why you can't compress that file); but container.xml is different. |
04-15-2012, 02:28 PM | #15 |
frumious Bandersnatch
Posts: 7,516
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
As a general answer, most of the things you find unnecessary, maybe are unnecessary. But the ePub format is not something created from scratch just for books, it uses some file formats and conventions that already existed, to make it easier to create reading applications and books from already existing code, to make it easier to parse them (because there are already tools that can deal with those things), etc. And although those pieces may look like garbage to you, they are there to let software know what it's dealing with. The effect they will have in the final filesize will most likely be minimal.
I would advise you to have a look at some of the books I've uploaded here. They're all coded "by hand", and have pretty minimal markup overhead, I believe. Of course, they have some things you think unnecessary, like author, TOC, illustrations or description. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[Old Thread] calibre not creating content.opf or toc.ncx files during conversion | foxxywith2xs | Calibre | 7 | 12-16-2012 07:49 PM |
NCX file generator (and html ToC and opf) | GiorgioC | Workshop | 0 | 07-12-2011 06:55 AM |
Use Regex to Code an Inline TOC, from an External TOC's .ncx File | mostlynovels | ePub | 2 | 03-16-2011 12:15 PM |
Saving with old toc.ncx file | Haderlump | Sigil | 1 | 12-28-2010 12:11 PM |
Compiling HTML,NCX and OPF file | pakiyabhai | Calibre | 8 | 12-25-2009 11:12 AM |