I have just found this thread, and have only skimmed through a portion of it - I will read it more carefully this afternoon. Forgive these comments that may be off-topic.
Clearly marking up is the answer, but should it be dictated by ebook formats at all?
Gutenberg (the biggest project of its kind), is not and should not be seen simply as a resource for current ebooks. It is a resource of incredible value for many things yet to be seen. But the problem is that it is anchored in its past. Other collections are in html, but the variety of application proves problematical.
If you think novels are a problem, think about plays and poetry collections. Think also of the need to transform text into Voice Synthesised readings, the problem of reference quoting etc.,. the list of what may be wanted to be read, heard or otherwise used only gets more complex and unpredictable as readers become more widespread, and other means of dealing with literature are developed.
I would propose that the Gutenberg problem does not lie in marking up for ebooks, but rather a markup that allows easy translation to things like epub (a very good move).
It is not a matter of light vs heavy markup.
It is matter of finding a light markup that can be transformed coherently and consistently into heavy markup, they may include voice markup, reference markup, and complete structural markup, that is potentially well beyond what any present reader can handle.
Yet at the same time can be used in a minimalist fashion and allow greater complexity to be added by future editors.
I would suggest, that TEI (text Encoding Initiative) is the only candidate.
However, anyone looking at it would faint from apparent complexity of what could be done.
TEI.lite is only lite from a scholar's perspective.
However, it should be possible to prepare a consistent sub-standard compatible to translation to epub for instance.
So why bother? Why not just use something like epub?
The reason is that as a document is edited over time and more and more elements are placed in it the thing has to be consistent. It is easy to substitute the main element names ect., to say epub, it is just as easy to ignore all else (element wise), by simple filtering.
It is not so simple to add in elements into a more restrictive scheme - that is the primary problem. It must be a system that allows for growing complicity over-time.
I believe there is only one candidate. However, it needs to have simply implemented templates and there is no reason why the base markup should not be designed specifically for translation into existing ebook formats, or indeed good formats not yet used.
Now if this is done well there is no reason why source text markup cannot be translated on site as part of the download process. So instead of keeping at projects like Gutenberg multiple file types, it keeps one file type (TEI. ultralite) and translates on the fly what a reader may like to use (including varieties of plain text).
|