View Single Post
Old 05-19-2009, 01:42 AM   #9
rogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-books
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
I think this is a discussion of META! But not a meta-discussion (but this title is!)

Okay, yeah, after being swayed by the wind, I'm back in the simple (x)HTML camp. I want to produce single-file ebooks (other than images/sounds, of course.)

So, let me get started: I dug out my old file of A Princess of Mars (it's in the public domain) and updated it -- it was still in my old format (easy but not trivial to fix.)

I'm going to go through it a piece at a time (I won't post entire chapters, just relevant stuff.)

Here's the current start of the file, through the head:



<!-- Conversion Started May/20/2004 -->
<!-- Revision # 0.80 on May/18/2009 -->


<title>A Princess of Mars</title>
<meta name="author" content="Burroughs, Edgar Rice">
<meta name="publisher" content="Found Text">
<meta name="genre" content="Science Fiction::General">
<meta name="ISBN" content="Found Text: #0085 v. 0.80">


<meta name="theme" content="Negative">
<meta name="number" content="0085">
<meta name="name" content="APrincessOfMars">
<meta name="version" content="0.80">
<meta name="title" content="A Princess of Mars">
<meta name="subtitle" content="Barsoom #01">
<meta name="series" content="Barsoom">
<meta name="seriesnumber" content="01">
<meta name="authorlast" content="Burroughs">
<meta name="authorfirst" content="Edgar">
<meta name="authormiddle" content="Rice">
<meta name="authorfull" content="Burroughs, Edgar Rice">
<meta name="rebgenre" content="Science Fiction::General">
<meta name="conversiondate" content="May/20/2004">
<meta name="source" content="University of Virginia Electronic Text Center">
<meta name="scanner" content="Judy Boss">
<meta name="proofer" content="Kelly Tetterton, Peter-John Byrnes, Found Text">
<meta name="revisiondate" content="May/18/2009">
<meta name="shortpath" content="REB1100\eBookProjects\Found Text\Burroughs_Edgar_Rice\Barsoom\01_APrincessOfMars\APrincessOfMars.html">


<meta name="endnotecount" content="0">

<!-- GOTO MENU -->

<meta name="rocket-menu" content="Table of Contents=#toc">
<meta name="rocket-menu" content="About this Version=#verso">

I looked at gwynevans source code, which is sweet-and-clean:

<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    <META NAME="Author" CONTENT="Konrath, J.A"/>
    <style type="text/css">
        p { text-indent: 1em; margin: 0; }
        h1 { page-break-before: always; font-style: italic; }
        div.drink { page-break-before: always; }
        div.drink p { text-indent: 8em; }
Now, of course, my stuff has a bunch of kludges in there, specific to the REB1100. But it also has some good stuff, I think. And I have a few comments about gwynevans (not critical of you, gwynevans, everyone has their own preferences) to use as a jumping off point.

Let's start with a couple of questions:

1) Is there a reason to prefer XHTML 1.0 over XHTML 1.1?

2) How to choose the encoding? ie: what's best, most universal, least hassle? I'm an english speaker (my thai is terrible! my japanese has faded, my french really sucks, but I can read a little of all of them -- I could be talked into some single way of seeing everything, as long as it won't hamper sharing, or over-complicate.)

3) gwynevans -- did you whip this out as an example, or was it something you have? I ask because it has no title, for instance.

4) I'm thinking to move all CSS to a separate file, any reason I shouldn't? (BTW, the page-break-before: always; part -- is that specific to ebooks, or part of XHTML? 'Cause I've been wondering about how to hard-code that.

And now to rip apart my gunk:

A) It has no DTD or -- what is it called when you specify html vs xml, etc?

B) For readability of the source (something that is very important to me) I keep a lot of sectioning with vertical space. I see (and have seen elsewhere) horizontal tabbing as a visual aid. Any good reason to prefer one over the other? Or to not combine them? I understand why the tabbing is there, but glancing at a page it is hard to find related sections -- they just don't stand out. And when there are a lot of sections to a document, I find that I have to right-scroll a lot, or that word wrap wrecks the layout.

C) Lower case tags: correct usage for XHTML, right?

D) Version information in comments -- I think this is a good practice, but for sharing would there be a better method? For instance, I don't name where to find the history/source, or what the numbers mean. I do have a set of guidelines for the numbers, should I include the guidelines? Or something else? It's going to be repeated later, but isn't it nice to open a file, and see the version, boom!, right there? Should I keep a list of all updates, instead of just first and latest?

E) REB1100 meta-info: well the <title> has to stay! And the next four <meta> tags are staying too, I think. (The ISBN tag is one I hijacked to display the collection number -- it would simply be returned to its original function.) Just gonna get rid of the comment, and merge the <meta> tags into a more general Meta section. Reasonable, right?

F) NoteTab ClipBook meta-info: For the goals of this thread, I am certain these <meta> tags are mixed up, and that as a pure source file, some should not be there. Just a comment on the meta/genre tag -- I found a collection of what looked like standard, official book-seller classifications online, and I wrote a macro to give me a drop-down list of genres. So it's got some universal sensibility, not just my personal conceits.
  • i) <meta name="theme"... : I have sets of icons and images that are used as links (ie: next chapter, previous chapter, toc, an end-of-book image that "closes" the book by linking to the cover, etc.) This just names it for the macro-library, and could be used to repair a broken folder, although I don't do that now. It should go, it's not necessary. But I still want to include such images!
  • ii) <meta name="number"... : this is a project number for the "publisher" , in this example named Found Text. The "publisher" is just a conceit, but it makes it convenient to group books, by genre or author, whatever. Still, not necessary? Or could it be adapted into a DocumentID, UID or something like that?
  • iii) <meta name="name"... : Both the filename(.html) and the project name in my database/filetree. I don't know. I can sort of see this either way. Useful? Or redundant? Need a better name itself? I always camel-case the title of the book and remove spaces.
  • iv) <meta version, title, subtitle, series, seriesnumber... : I think these stay. Can't see any reason not to have them. Maybe they need better names?
  • v) <meta authorlast, authormiddle, authorfirst, authorfull ... : Why do I not just use "author"? Well, sometimes you need to manipulate the name for display, other times you need to sort by last name. I think FBreader, for one, let's you choose the sorting tag. Also, the macros I write let me collect names as I add them, so why have to re-type "Edgar" when I add Poe to my collection? I could see adding an author <meta>, or re-jiggering authorfull, and changing the name of the current authorfull to something else (author-by-last, authoralphabetical?) My current rule is when you have an initial, you don't use a period. I let the macros sort it out. But that may not be best. Does this all make sense? Regardless, I don't think a simple <meta name="author" content="Edgar Rice Burroughs"> is enough.
  • vi) <meta rebgenre ... : It's just here because I wanted to keep the REB1100 functional stuff separated from the NoteTab Clip stuff. It made it a lot easier to parse the file when updating. Redundant, gone.
  • vii) <meta conversiondate, source, scanner, proofer, revisiondate ... : All necessary, I think. Maybe need better names? For example, should conversiondate, be initialconversiondate? or xhtmlconversiondate? Or something else, others?
  • viii) <meta shortpath ... : this is specific to the filetree, and makes setting things up in the macros easier. Unnecessary. Gone.
  • ix) The entire endnote part should go, I think. I keep the number to allow for adding new endnotes as the text is processed. It's really just a backup (in a way, all these meta-info are, given that I keep it all in a database, too.) Gone.
  • x) GOTO-MENU section: Gone. This is a section for the REB1100 (and specifically for rbmake, I believe); it allows for up to seven (that's right, folks, a whole seven!) pop-up TOC links.

So what do you think? I know this is a super-long post: don't feel that you have to respond to everything, just whatever you think is good or bad. I'm looking to develop a best-practice here, and I don't see a lot of discussion about embedding meta-info.

Thanks for reading,

m a r
rogue_ronin is offline   Reply With Quote