Originally Posted by bowerbird
actually, hadrien, i am extremely familiar with project gutenberg e-texts.
and the one thing i can tell you is that they're _consistently_ inconsistent.
so yes, some early books used all-caps for italics, rather than underscores.
and along the way, a variety of characters were used beside underscores...
and up until 2003 or so, when i became a severe pain-in-the-neck to them
on these issues, they didn't even feel any need to mark italics consistently...
even worse, they used all-caps for bold as well, and likewise felt no need
to be consistent with that either. (sometimes they didn't mark bold at all.)
Amen to all of that. Though be grateful for the fact that the text is out there at all and you don't have to OCR it yoursel! Also you can see the issue from the point of view of the original transcribers as well. For example I've just been restoring the italics in the PG text of Nostromo, and very often the transcriber users initial caps for a word that was originally in italics - probably a more elegant and reader-friendly solution than using forward slashes for italicized words.
i've invented a form of non-markup markup -- i call it "zen markup language",
or z.m.l. (it's two steps more advanced than x.m.l.) -- where such structural
information is represented by a simple set of unobtrusive light-markup rules.
for instance, a regular chapter-header is preceded by 4 blank lines and followed
by 2 blank lines, thus allowing a viewer-application (which i've also programmed)
to automatically form a table of contents that is auto-hot-linked to the chapters...
other simple rules -- easy enough to be understood by a fourth-grader --
underlie all of the other structures that are commonly found in books...
you can see work that i've done, in action, by visiting this web-page:
you'll be particular interested in the "test-suite" and "rules" examples...
i believe intelligent viewer-programs intepreting plain-ascii input e-texts
and presenting them in typographically-sophisticated ways is _the_ future.
the publishing companies, of course, in an attempt to raise the cost of entry,
will try to force e-books into the complexity of heavy-markup, but i believe
the revolution into self-publishing will push back with light-markup systems.
authors don't want to battle steep learning curves. they just want to write...
I don't understand why you would need a new mark-up, correctly used, html mark-up [eg h1 for the book title h2 for the part or section title and h3 for the chapter] gives you all the semantic information you need. (Poetry is another story). Personally I believe that plain vanilla html (or its baby siblings markdown, textile etc) is the new ascii.