View Single Post
Old 04-02-2009, 03:20 AM   #86
cerement
Groupie
cerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it is
 
cerement's Avatar
 
Posts: 170
Karma: 2000
Join Date: Apr 2008
Location: San José, CA
Device: Amazon Kindle 1, Sony PRS-300, Amazon Kindle 3
Quote:
Originally Posted by kovidgoyal View Post
What would be the advantage for PG in adopting a highly structured master format?
A "master format" simplifies and speeds up automated conversion processes drastically. Adobe Photoshop doesn't have a separate process for converting from each colorspace to any other colorspace. Photoshop's "master format" is Lab. All color conversions are done from colorspace to Lab, and then from Lab to the new colorspace.

Generating plaintext from XML is trivial, generating pretty-printed plaintext from XML is just as easy. Generating whichever XML variant from plaintext is harder and relies on the volunteers, but when you already have volunteers churning out half a dozen formats, some automated, some not, choosing a secondary "master format" (since primary is plaintext) would focus the volunteer work and allow easier automatic generation of multiple output formats (a la Feedbooks).

As an example: imagine if Calibre allowed 6 formats, both as input and as generated output. With an internal "master format", you need 12 conversion templates, 6 for input format to master format, and 6 for master format to output format. Without a master format, that would be 30 conversion templates (excluding a self-to-self conversion). With a master format, adding a 7th format would mean only adding 2 new conversion templates. Without would mean adding 12 new conversion templates.

Quote:
Originally Posted by kovidgoyal View Post
Without specialised tools to generate the master format, expecting volunteers to produce it would be, to put it mildly, overly optimistic.
That had been the initial idea behind Marcello creating the PGTEI DTD based off of TEI-Lite. PGTEI was close enough to HTML to be familiar to most PG users and could be either easily handcoded or converted from (X)HTML with minimal search-and-replace.

And as mikecook mentioned, there's certainly been plenty of arguments already, pro and con, for PG to adopt a master format. Currently, it looks like that master format is HTML for generating ePub, Mobi, and Plucker, but PG's HTML file quality varies even more randomly than their plaintext quality.
cerement is offline   Reply With Quote