View Single Post
Old 11-09-2007, 07:34 PM   #136
GregS
Zealot
GregS has a complete set of Star Wars action figures.GregS has a complete set of Star Wars action figures.GregS has a complete set of Star Wars action figures.GregS has a complete set of Star Wars action figures.
 
Posts: 107
Karma: 308
Join Date: Oct 2007
Location: Perth Australia
Device: EZ Reader 5", Iliad
bowerbird
"that _sounds_ good. until you realize that -- depending on
how one defines "properly structured", and how one considers
"the widest possible uses", not to mention the crystal-ball on
"the future" -- doing heavy markup might be _very_ expensive."

I do not understand this at all.

Full marking up in TEI is not being suggested, I am only suggesting a lightweight standard no more difficult than Xhtml or epub, but adapted for text repositories rather than display (though with CSS this poses no problem whatsoever - epub requires a trivial translation).

It was after having a look at Gutenberg Marker (which solves a lot of problems really very well), that a couple of extra steps would make such software very useful for establishing a standard ultra light TEI.

As for future proofing, you seem to miss the point altogether. Scholars have already developed concepts of textual structural analysis. If the structure can be unambiguously marked the text is future proofed, because one way or another it is the structure that has been the most elusive aspect of text and handling it for different purposes.

No magic involved, just progressively adding in tags by editors who do know what they are doing (TEI is as the texts are themselves inherently hard to do as the complexity of the markup meets the complexity of text itself).

I have taken at least seven or eight Shakespeare plays from Gutenberg, turned them into word processing documents, cleaned them and then through stylesheets reedited them and finally after a lot of effort (plays are really hard to do compared to novels) produce a pdf to print out copies for my students.

In short, though the tools are wrong for the purpose I have been doing just sort of thing because I had no choice - but the end result did not give anyone else useful. I could just as well have been properly marking up the play (using TEI derived tags) and placed back in Gutenberg something useful to others.

This is what I mean by progressively taging repository texts. Academics occasionally resort to Gutenberg, but whatever they do the text is lost to the repository as well. Students no doubt use the texts for study in their literature degrees, whatever they do is lost.

I have dealt with HTML and text versions of a variety of literature. For some purposes just being in HTML makes things very easy, but it also can make things a lot harder as well.

The other thing is that just reading texts (or printing them) is only one aspect of deigitizing texts. Storing them as virtual texts (with their structure preserved and readable) is vitally important for the preservation of literature. This is not just academic prejudice, it is what makes the texts adjustable to unpredictable future usage.

A chapter is not a title (a small criticism of Gutenberg Marker) it is a division that may or may not have a title. Hence I may in the future for whatever reason, desire to quickly retrieve Chapter Seven of "Pride and Prejudice" how can this be done unless the computer has a means of finding exactly what I asked it to find?

You don't need a crystal ball just an understanding of text itself from a scholarly point of view. These people have not been wasting their time, their precision is not useless but vital, and their knowledge (a part of which resides in the very code of TEI) cannot be ignored.

And I repeat an ultr-light version of TEI need be no more difficult than what we are already using, but it is not closed off like XHTML/epub or any other display technology. Being XML it is probably just as displayable in most contexts anyhow.

I looked at your references and had seen them, but as I could only see html markup in the source I went looking elsewhere. Sorry for the mistake.

My description of this thread as being about the Second Digital Revolution was not misplaced. The whole problem with Gutenberg at the moment is that it is rooted in the First, hence the compounding problems and the variety of solutions being proposed.

I have no prejudice against your system, except of course, until I trawl through this long thread I have no clear idea of what it is, I am based on other readers comments, not too sure I will be that much clearer if I do.

"you're welcome to look at it, but i can pretty much tell you now that it
won't be a good fit, because your head wants an "ideal" markup system
-- which anticipates "any possible use, now or in the future" -- whereas
z.m.l. is fully grounded in the tradeoffs that a cost-benefit ratio demands."


For me this just places it amongst display technologies, which is no bad place to be. The problems of text repositories is a different thing altogether.

Consider that a text is fully marked up in TEI (ie more tags than text). A huge labour but one that can accumulate over time in a systematic and reliable way. What is needed to translate it? "Find these tags and change them thus...." "Ignore every other tag" - the end result could be anything. As I wrote (with some help) something very similar in REBOL, I know that such a script be less than a page of code and is only a tiny delay in simply copying the file to a new location.

I also hold out some hope that students around the world may in the mid-term future look forward to having a device that is a notebook/reader capable of displaying TEI encoded documents in an academically useful way. However, the technology has to develop and for that it needs to establish a good market, for that the most important aspect is to establish standards such as epub, which may not be the most efficient or versatile, but make it possible to buy and keep literature with some assurance that on future devices it is either directly readable or can be made so.

My opinion, is that efficient display codes as you propose are not the real problem. I don't doubt everything you say about it, I have serious reservations that it answers the right question.

However, as a display technology it may well have a place. I would ad the proviso that if it can easily translate from epub and to epub then this would be a vital attribute in its acceptance, especially if it is as easy as you say to code with it.


To everyone else, please carefully consider the idea, though it be far removed from mobileread of the separate problem of text repositories.

For my part, when time permits, I intend to look carefully at epub and try and make a version of TEI to fit it, and then write a small program to translate it one way and the other.

If this looks any good in the end, I will set up a site, make people aware of it here, at various text repositories I know of and of course the TEI consortium. However, it is at least a month before I can seriously sit down with it, and maybe not even then.

If anyone is in a better position, to do the same thing, I will help in anyway I can. Ideally if it works well, epub software might easily be adapted to display it as well and thus solve the problem in one blow.


Greg Schofield
Perth Australia
(An English High School teacher)
GregS is offline