View Single Post
Old 04-27-2009, 04:20 PM   #1
Robotech_Master
Evangelist
Robotech_Master ought to be getting tired of karma fortunes by now.Robotech_Master ought to be getting tired of karma fortunes by now.Robotech_Master ought to be getting tired of karma fortunes by now.Robotech_Master ought to be getting tired of karma fortunes by now.Robotech_Master ought to be getting tired of karma fortunes by now.Robotech_Master ought to be getting tired of karma fortunes by now.Robotech_Master ought to be getting tired of karma fortunes by now.Robotech_Master ought to be getting tired of karma fortunes by now.Robotech_Master ought to be getting tired of karma fortunes by now.Robotech_Master ought to be getting tired of karma fortunes by now.Robotech_Master ought to be getting tired of karma fortunes by now.
 
Posts: 422
Karma: 351205
Join Date: May 2006
How to deal with irregular hard-wrapping on a large scale?

Lately I've been writing some columns on TeleRead looking back at writing groups that were cranking out Internet fiction years before the term "e-book" even entered common usage. Here's the first two:

http://www.teleread.org/2009/04/26/supergu/
http://www.teleread.org/2009/04/27/t...-of-netheroes/

These groups (and the others I'll be covering in future entries) have copious archives. At last count, Superguy had over 12 and 1/2 million words in its archives. It even has a handy CGI script to retrieve just those archives which need reading.

That quantity of material just cries out for reading on an e-book device, rather than sitting at the screen. But it also begs the question: how? It comes from the era of green- or amber-screened monospace ASCII terminals, text-only e-mail and USENET, and hard wrapping at the end of the line. You can't put that through an e-book converter without unwrapping it first somehow.

And to make matters worse, the method of paragraph separation isn't consistent. A lot of writers use paragraph indenting with no space, some use blank spaces, some use both. Section separations aren't consistent either. (Some of them even use ASCII art or logos, but we can not bother worrying about those. Also, they tend to use two spaces after a period instead of one, due to the typographical conventions of monospace fonts.)

A friend whipped up some perl scripts for me that can be used to unwrap indented or space-separated text, or to kill indents in something that uses both stylesóbut those rely on being able and knowing how to run perl scripts (which I do, but not every potential reader would), and they rely on what you're unwrapping all being in the same style.

Anyone know of a simpler solution accessible to more people? (Or, failing that, that can at least be used on mixed-style archives and still unwrap properly?)
Robotech_Master is offline   Reply With Quote