actually, hadrien, i am extremely familiar with project gutenberg e-texts.
and the one thing i can tell you is that they're _consistently_ inconsistent.
so yes, some early books used all-caps for italics, rather than underscores.
and along the way, a variety of characters were used beside underscores...
and up until 2003 or so, when i became a severe pain-in-the-neck to them
on these issues, they didn't even feel any need to mark italics consistently...
even worse, they used all-caps for bold as well, and likewise felt no need
to be consistent with that either. (sometimes they didn't mark bold at all.)
i know all this because i have been working for some time now on means of
interpreting the p.g. e-texts in a way that restores the structural information.
the same type of work you do when you put texts into your database, except
i leave them as text. (so ordinary humans can continue to work with them...)
i've invented a form of non-markup markup -- i call it "zen markup language",
or z.m.l. (it's two steps more advanced than x.m.l.) -- where such structural
information is represented by a simple set of unobtrusive light-markup rules.
for instance, a regular chapter-header is preceded by 4 blank lines and followed
by 2 blank lines, thus allowing a viewer-application (which i've also programmed)
to automatically form a table of contents that is auto-hot-linked to the chapters...
other simple rules -- easy enough to be understood by a fourth-grader --
underlie all of the other structures that are commonly found in books...
you can see work that i've done, in action, by visiting this web-page:
you'll be particular interested in the "test-suite" and "rules" examples...
i believe intelligent viewer-programs intepreting plain-ascii input e-texts
and presenting them in typographically-sophisticated ways is _the_ future.
the publishing companies, of course, in an attempt to raise the cost of entry,
will try to force e-books into the complexity of heavy-markup, but i believe
the revolution into self-publishing will push back with light-markup systems.
authors don't want to battle steep learning curves. they just want to write...