|  05-18-2009, 12:17 AM | #1 | 
| Banned        Posts: 475 Karma: 796 Join Date: Sep 2008 Location: Honolulu Device: Nokia 770 (fbreader) |  (x)html ebook specification 
			
			I'd like to get some brains on this subject. I'm (slowly) assembling and editing a giant library of HTML ebooks. I've been using an idiosyncratic mix of HTML 3.2 and XHTML that I've picked up over the years. I use an obsolete reader -- but a very functional one! (The REB1100.) I'm looking to upgrade, though, soon. And I'd like to do this only once, the editing/organizing. I use an awesome text editor, NoteTab Pro, which lets you assemble libraries of macros -- anything you can do to text, you can do with the macros, it's got an enormous language. So I've built a library with a hundred or two macros, that do everything from regex to boilerplate to file manipulation and database entries. So I need some advice on a better 'spec' for the format -- I should be able to rewrite the macros to the new one, and write a few that auto-adapt the old stuff I've done already. Creating macros to write CSS for any reader should be dead-simple, or writing converters to straight HTML, also -- once the format is set and consistent. There are a lot of ideas out there, and I have my own, which I'll start with: This spec should use XHTML, and CSS. But the document markup should be as simple as possible. Here are the elements that I think are important in an ebook, primarily fiction books -- a mix of meta-data and structure -- the meta is often explicitly expressed in the book: please add on if I've missed something. Book Meta: Author(s), Illustrator(s), Publisher, ISBN, Publishing Date, Publishing City, Copyright Owner, Copyright Date, Series Name, Title, Sub-Title File Meta: Version Number, Version Date, Original Conversion Date, Scanner, Proofreader(s), Original Source Structure: Cover, Front Matter, Title Page, Verso Page (book meta info page), Inscription, Acknowledgments, Preface, Foreword, Table of Illustrations/Maps, Table of Contents, Prologue, Parts, Chapters, Epigrams, Sections, Sub-Sections, Paragraphs, Epilogue, Afterword, Endnotes, Glossary, Index, End Matter If I've missed anything, please add or suggest. In my next post, I'm going to add my current methods, and ask for advice on improvements. Thanks for reading! m a r | 
|   |   | 
|  05-18-2009, 12:54 AM | #2 | |
| Guru            Posts: 610 Karma: 4150 Join Date: Mar 2008 Device: Sony Reader PRS-T3, Kobo Libra H2O | Quote: 
 As for organizing everything neatly, you may want to look at my "Calibre preprocessor" H2LRF. | |
|   |   | 
|  05-18-2009, 05:50 AM | #3 | 
| Banned        Posts: 475 Karma: 796 Join Date: Sep 2008 Location: Honolulu Device: Nokia 770 (fbreader) | 
			
			Read your thread/link.  It's funny, I already do something like that, too, with all the meta-data.  More than what you've got.  Anal-retentively more. 'Course, I'm not using Calibre (yet.) And the utility I use is my text editor, not a pre-processor to call Calibre... I think what I'm shooting for here is a utility/software/reader neutral way to present an ebook in simplest HTML -- consistency being the key. Then anyone can take it and convert it pretty easily. Heck, you could use it for your h2lrf, almost without effort -- just a minor change to the meta you look for. As for the CSS macros -- I think what I'm talking about is that each ebook reader (hardware) should probably have its own CSS, right? I mean, what looks good on a 5" JetBook, probably doesn't look as good on an 11" DS1000. So you'd want to answer a few questions in a dialog (well, I would) about what sort of reader you're trying to make an EPUB for. Then, boom, CSS created. Maybe it calls some common defaults in a common.css file or such. Now that I re-read my thoughts above, I think I'm talking about two separate things. Common, single CSS will be all that's necessary. Later in the process, when I want to make a package for a hardware reader, then an additional CSS macro might be necessary. You're right, as usual pepak. Anyway, gonna dig out my old Barsoom folder, and grab A Princess of Mars to use as an example for the next stuff. Back later, m a r | 
|   |   | 
|  05-18-2009, 08:38 AM | #4 | 
| Resident Curmudgeon            Posts: 80,677 Karma: 150249619 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3 | 
			
			Since you are coding XHTML, why not have a look at ePub? It would do quite well for you (IMHO).
		 | 
|   |   | 
|  05-18-2009, 10:01 AM | #5 | 
| Banned        Posts: 475 Karma: 796 Join Date: Sep 2008 Location: Honolulu Device: Nokia 770 (fbreader) | 
			
			Yeah, I can't seem to find a clear set of guidelines/tutorial for ePub -- they all (so far) seem to assume a level of familiarity with XML that I don't have.  I'm never happy if I just mimic without understanding -- and that takes a while. I think I might use this thread to teach myself how to do it (ePub) properly, and modularly, with great metadata, and good in-book navigation. Just do it a piece at a time, and hope folks chime in when I'm screwing up. It's the XML spine, etc. where I start to get truly lost. 'Course, if I figure it out once, I can just macro the heck out of it. m a r | 
|   |   | 
|  05-18-2009, 11:39 AM | #6 | 
| Grand Sorcerer            Posts: 11,470 Karma: 13095790 Join Date: Aug 2007 Location: Grass Valley, CA Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7 | 
			
			The wiki can help in your research. It has most of the topics you have expressed interest in and can provide a starting point. If you find any deficiencies you can correct them! or ask for help. Dale | 
|   |   | 
|  05-18-2009, 03:34 PM | #7 | 
| Guru            Posts: 610 Karma: 4150 Join Date: Mar 2008 Device: Sony Reader PRS-T3, Kobo Libra H2O | 
			
			You can convert XHTML to anything easily. You can't do that with EPUB, even though it uses XHTML as its basis (e.g. with EPUB your converter needs to be able to handle multiple source files).
		 | 
|   |   | 
|  05-18-2009, 06:36 PM | #8 | 
| Wizzard            Posts: 1,402 Karma: 2000000 Join Date: Nov 2007 Location: UK Device: iPad 2, iPhone 6s, Kindle Voyage & Kindle PaperWhite | 
			
			The way I've been doing it when creating ePubs has been to just create the XHTML as below and run it through Calibre to create the actual ePub, as that way  I can work with a single source but use Calibre to do the file-splitting & 'twiddly bits' to create a valid ePub.
		 | 
|   |   | 
|  05-19-2009, 01:42 AM | #9 | 
| Banned        Posts: 475 Karma: 796 Join Date: Sep 2008 Location: Honolulu Device: Nokia 770 (fbreader) | 
				
				I think this is a discussion of META!  But not a meta-discussion (but this title is!)
			 
			
			Okay, yeah, after being swayed by the wind, I'm back in the simple (x)HTML camp.  I want to produce single-file ebooks (other than images/sounds, of course.) So, let me get started: I dug out my old file of A Princess of Mars (it's in the public domain) and updated it -- it was still in my old format (easy but not trivial to fix.) I'm going to go through it a piece at a time (I won't post entire chapters, just relevant stuff.) Here's the current start of the file, through the head: Code: <html> <head> <!-- Conversion Started May/20/2004 --> <!-- Revision # 0.80 on May/18/2009 --> <!-- META INFO USED BY THE REB1100 FOR DISPLAY ON THE ABOUT PAGE --> <title>A Princess of Mars</title> <meta name="author" content="Burroughs, Edgar Rice"> <meta name="publisher" content="Found Text"> <meta name="genre" content="Science Fiction::General"> <meta name="ISBN" content="Found Text: #0085 v. 0.80"> <!-- META INFO USED BY THE REB1100 NOTETAB CLIPBOOK --> <meta name="theme" content="Negative"> <meta name="number" content="0085"> <meta name="name" content="APrincessOfMars"> <meta name="version" content="0.80"> <meta name="title" content="A Princess of Mars"> <meta name="subtitle" content="Barsoom #01"> <meta name="series" content="Barsoom"> <meta name="seriesnumber" content="01"> <meta name="authorlast" content="Burroughs"> <meta name="authorfirst" content="Edgar"> <meta name="authormiddle" content="Rice"> <meta name="authorfull" content="Burroughs, Edgar Rice"> <meta name="rebgenre" content="Science Fiction::General"> <meta name="conversiondate" content="May/20/2004"> <meta name="source" content="University of Virginia Electronic Text Center"> <meta name="scanner" content="Judy Boss"> <meta name="proofer" content="Kelly Tetterton, Peter-John Byrnes, Found Text"> <meta name="revisiondate" content="May/18/2009"> <meta name="shortpath" content="REB1100\eBookProjects\Found Text\Burroughs_Edgar_Rice\Barsoom\01_APrincessOfMars\APrincessOfMars.html"> <!-- ENDNOTE COUNT --> <meta name="endnotecount" content="0"> <!-- GOTO MENU --> <meta name="rocket-menu" content="Table of Contents=#toc"> <meta name="rocket-menu" content="About this Version=#verso"> </head> Code: <?xml version="1.0" encoding="utf-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<HTML>
<HEAD>
    <META NAME="Author" CONTENT="Konrath, J.A"/>
    <style type="text/css">
        <!--
        p { text-indent: 1em; margin: 0; }
        h1 { page-break-before: always; font-style: italic; }
        div.drink { page-break-before: always; }
        div.drink p { text-indent: 8em; }
        -->
    </style>
</HEAD>Let's start with a couple of questions: 1) Is there a reason to prefer XHTML 1.0 over XHTML 1.1? 2) How to choose the encoding? ie: what's best, most universal, least hassle? I'm an english speaker (my thai is terrible! my japanese has faded, my french really sucks, but I can read a little of all of them -- I could be talked into some single way of seeing everything, as long as it won't hamper sharing, or over-complicate.) 3) gwynevans -- did you whip this out as an example, or was it something you have? I ask because it has no title, for instance. 4) I'm thinking to move all CSS to a separate file, any reason I shouldn't? (BTW, the page-break-before: always; part -- is that specific to ebooks, or part of XHTML? 'Cause I've been wondering about how to hard-code that. And now to rip apart my gunk: A) It has no DTD or -- what is it called when you specify html vs xml, etc? B) For readability of the source (something that is very important to me) I keep a lot of sectioning with vertical space. I see (and have seen elsewhere) horizontal tabbing as a visual aid. Any good reason to prefer one over the other? Or to not combine them? I understand why the tabbing is there, but glancing at a page it is hard to find related sections -- they just don't stand out. And when there are a lot of sections to a document, I find that I have to right-scroll a lot, or that word wrap wrecks the layout. C) Lower case tags: correct usage for XHTML, right? D) Version information in comments -- I think this is a good practice, but for sharing would there be a better method? For instance, I don't name where to find the history/source, or what the numbers mean. I do have a set of guidelines for the numbers, should I include the guidelines? Or something else? It's going to be repeated later, but isn't it nice to open a file, and see the version, boom!, right there? Should I keep a list of all updates, instead of just first and latest? E) REB1100 meta-info: well the <title> has to stay! And the next four <meta> tags are staying too, I think. (The ISBN tag is one I hijacked to display the collection number -- it would simply be returned to its original function.) Just gonna get rid of the comment, and merge the <meta> tags into a more general Meta section. Reasonable, right? F) NoteTab ClipBook meta-info: For the goals of this thread, I am certain these <meta> tags are mixed up, and that as a pure source file, some should not be there. Just a comment on the meta/genre tag -- I found a collection of what looked like standard, official book-seller classifications online, and I wrote a macro to give me a drop-down list of genres. So it's got some universal sensibility, not just my personal conceits. 
 So what do you think? I know this is a super-long post: don't feel that you have to respond to everything, just whatever you think is good or bad. I'm looking to develop a best-practice here, and I don't see a lot of discussion about embedding meta-info. Thanks for reading, m a r | 
|   |   | 
|  05-19-2009, 04:24 AM | #10 | 
| Wizzard            Posts: 1,402 Karma: 2000000 Join Date: Nov 2007 Location: UK Device: iPad 2, iPhone 6s, Kindle Voyage & Kindle PaperWhite | 
			
			> 1) Is there a reason to prefer XHTML 1.0 over XHTML 1.1? None that I know of - I guess I just had the 1.0 header to hand & for this particular use, I don't think there was any difference between 1.0 & 1.1. > 3) gwynevans -- did you whip this out as an example, or was it something you have? I ask because it has no title, for instance. At the time I'd not considered custom metadata & pre-processing, so just had a 'build_ePub.bat' in the folder which set some of the metadata via the Caliber command-line, e.g. 'html2epub --margin-right=10 --level1-toc="//h2" --chapter="//h2" --cover="Konrath, J.A - Jack Daniels 01 - Whiskey Sour.png" -t "Whiskey Sour" -a "Konrath, J.A" "Konrath, J.A - Jack Daniels 01 - Whiskey Sour.html"' > 4) I'm thinking to move all CSS to a separate file, any reason I shouldn't? If you've come up with a standard set of styles that you want to reuse, then it's worth considering, although the main reason to do so in the web-site case is to allow global changes by editing the one file, which may be less of an issue in this particular usage. > (BTW, the page-break-before: always; part -- is that specific to ebooks, or part of XHTML? 'Cause I've been wondering about how to hard-code that. Standard, but it's less well known as it's focussing on the print side of things - http://www.w3schools.com/Css/pr_print_pagebb.asp | 
|   |   | 
|  05-19-2009, 04:41 AM | #11 | ||||||
| Guru            Posts: 610 Karma: 4150 Join Date: Mar 2008 Device: Sony Reader PRS-T3, Kobo Libra H2O | 
			
			XHTML 1.1 is a bit "cleaner" (from the technical point of view), which makes it a bit more restrictive. That's a good thing, IMHO. Quote: 
 Quote: 
 Quote: 
 Quote: 
 Quote: 
 Quote: 
 | ||||||
|   |   | 
|  05-19-2009, 05:10 AM | #12 | |
| The Grand Mouse 高貴的老鼠            Posts: 74,412 Karma: 318076944 Join Date: Jul 2007 Location: Norfolk, England Device: Kindle Oasis | Quote: 
 | |
|   |   | 
|  05-19-2009, 06:07 AM | #13 | 
| Banned        Posts: 475 Karma: 796 Join Date: Sep 2008 Location: Honolulu Device: Nokia 770 (fbreader) | 
			
			Okay, thanks for the link, I've started to read the tutorial on CSS. I've found this example of a 1.1 header elsewhere: Code: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> I understand about using Calibre. That makes a lot of sense as a way to just get it done quickly and adequately. 'Course, I'm goin' all ballistic on this right now...  So, looking at my first post, and my second long post, and a new idea or two here's my proposed start of a new file with good meta-info (using the old data, and faking where necessary): Code: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>A Princess of Mars</title> <!-- BEGIN: DOCUMENT HISTORY --> <!-- Created on 20/May/2004 --> <!-- Revision # 0.10 on 30/Jun/2004 --> <!-- Revision # 0.20 on 28/Dec/2004 --> <!-- Revision # 0.40 on 13/Apr/2005 --> <!-- Revision # 0.70 on 30/Sep/2006 --> <!-- Current Revision # 0.80 on 18/May/2009 --> <!-- END: DOCUMENT HISTORY --> <!-- BEGIN: EBOOK META INFORMATION --> <meta name="filename" content="APrincessOfMars"> <meta name="fileid" content="FoundText0085"> <meta name="filecreationdate" content="20/May/2004"> <meta name="fileversion" content="0.80"> <meta name="filerevisiondate" content="18/May/2009"> <meta name="filesource" content="University of Virginia Electronic Text Center"> <meta name="filescanner" content="Judy Boss"> <meta name="fileproofer" content="Kelly Tetterton, Peter-John Byrnes, Found Text"> <meta name="title" content="A Princess of Mars"> <meta name="subtitle" content="Barsoom #01"> <meta name="series" content="Barsoom"> <meta name="seriesnumber" content="01"> <meta name="genre" content="Science Fiction::General"> <meta name="author" content="Edgar Rice Burroughs"> <meta name="authorlast" content="Burroughs"> <meta name="authorfirst" content="Edgar"> <meta name="authormiddle" content="Rice"> <meta name="authoralpha" content="Burroughs, Edgar Rice"> <meta name="illustrator" content="Frank Frazetta" <meta name="illustratorlast" content="Frazetta"> <meta name="illustratorfirst" content="Frank"> <meta name="illustratormiddle" content=""> <meta name="illustratoralpha" content="Frazetta, Frank"> <meta name="publisher" content="Found Text"> <meta name="publicationdate" content="08/July/2010"> <meta name="publicationcity" content="Honolulu"> <meta name="copyrightholder" content=""> <meta name="copyrightdate" content=""> <meta name="isbn" content=""> <!-- END: EBOOK META INFORMATION --> </head> If there are more than one author or illustrator, append ## to the name attribute: ie, author01, for the 2nd author, illustrator03 for the 4th illustrator. I think 100 authors and illustrators is enough. Let the parser figure it out. Or should I start with author01? Or author00? I don't think so, but... The only inconsistency this leaves is with the proofer: I can't see a reason though why you might need more than a simple, comma-separated list. Can anyone? And, I guess, sometimes publishers have more than one city -- but a simple list would do there, too, wouldn't it? All dates in dd/mmm/yyyy format. Use leading 0's for all numbers less than 10. I really do appreciate any input to this typing-out-loud, thanks, m a r | 
|   |   | 
|  05-19-2009, 06:17 AM | #14 | 
| Banned        Posts: 475 Karma: 796 Join Date: Sep 2008 Location: Honolulu Device: Nokia 770 (fbreader) | 
			
			Hmmm, another thought.  Does it make sense to include the following two things? #1: versioning info. Often when I get a file, the version number is basically meaningless. Here's the versioning ranks I use: Code: 0.10 Initial Conversion 0.20 Cover and Frontispiece 0.30 Sections, Chapters and TOC 0.40 Endnotes and/or Blockquotes 0.50 Initial Spellcheck 0.60 Mdashes and Hyphens and Ellipses 0.70 Italics, Bold, and Pre-Formatted Text 0.80 Reading Proof 0.90 Checked against Canonical Source 1.00 Touched Up and Packaged For Release Code: Title: h1 class="title" Subtitle: h3 class="subtitle" Chapter: h3 class="chapter" Paragraph: p class="normal" Epigram: p class="epigram" etc. etc. | 
|   |   | 
|  05-19-2009, 06:21 AM | #15 | ||
| Guru            Posts: 610 Karma: 4150 Join Date: Mar 2008 Device: Sony Reader PRS-T3, Kobo Libra H2O | Quote: 
 Do you have any specific reason why you don't want to use multiple meta's? Code: <meta name="proofer" content="Person A" /> <meta name="proofer" content="Person B" /> <meta name="proofer" content="Person C" /> Quote: 
 | ||
|   |   | 
|  | 
| Tags | 
| html, library, meta, structure, xhtml | 
| Thread Tools | Search this Thread | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Any ongoing work on the epub specification? | b.tarde | ePub | 10 | 03-18-2010 08:33 PM | 
| ePub and top margin specification | tompe | Upload Help | 6 | 01-02-2010 11:24 AM | 
| Ask about specification | bthoven | PocketBook | 35 | 11-13-2009 12:33 PM | 
| BeBook 2 Specification | keng2000 | BeBook | 6 | 11-02-2009 01:17 PM | 
| PRS-500 lrf file specification | Dave Berk | Sony Reader Dev Corner | 2 | 05-01-2007 02:12 AM |