What "Cleaning Up" Do Project Gutenberg Texts Need [closed] - Page 5

jbenny · 11-04-2007, 08:15 PM

Quote:

Originally Posted by bowerbird

jbenny said:
> He doesn't seem to be open to discussion or suggestions,
> but only in promoting his own way of doing things.

what, precisely, is it that you think you have "taught" me?

i've been working on this for many _years_ now, and i know
what my system does. and you've got -- at the very best! --
a sketchy understanding. yet somehow, you think you can
come up with something that i haven't considered? my word...

heavy-markup advocates like yourself have been attacking me
from the very first time i ever uttered a word about this system,
and they've stayed in attack-mode for -- quite literally -- _years_,
and yet you think you've come up with something unique? what?

that's rich. i mean, that's really _rich_...

this is a serious question: what is it you think i've "discounted"?

-bowerbird

Another post that not only doesn't address any of the points that he is supposedly responding to, but adds in his own persecution complex induced version of what he thinks was said.

And if you think you can get six figures for your system and code, then why are you over here, hassling everyone, who you think know less than you do? Sounds like it is time for you to take your medication again. Rant on. I for one will be ignoring your ravings from now on and hoping you go away.

bowerbird · 11-04-2007, 09:06 PM

kovidgoyal said:
> 1) Light markup has minimal features.

you make that assertion, but you do absolutely nothing to support it.
what features are found in books that are lacking from my test-suite?

> 2) If authors use a GUI to generate ebooks, then they don't care about the markup,
> which then negates your argument for lightweight markup from the perspective of authors.

then it should be the case that i will have no users for my format.
so let's see if that's what happens. if it is, no sweat off your nose.
so why do you care?

> 3) Lightweight markup is suitable for people who digitize books (like p.g.)
> but not for people who create books, since people who digitize/convert books
> typically don't care about advanced features, while people who create them do.

once again, you seem to be awfully concerned about something that
should pose absolutely no threat to you, if what you're saying is true.

> Some new points:
> 1) If you aren't open sourcing your code then good bye and good luck.
> All you're doing then is defining a specification. Any 10 year old that
> spends a week thinking about the requirements for an ebook format
> could do that.

you can say and think whatever you like. but, you know, it's my time, and
i'm the one who decides how i spend it. and, for the people reading along,
i'm doing much more than "defining a specification". i'm giving you tools
to put that specification to work, turning plain-text files into e-books that
are beautiful _and_ have superior functionality. whether you care or not,
well, that's up to you... i don't expect everyone to care, maybe not anyone.

> 2) Considering that you are designing a limited specification
> with closed source authoring/viewing software
> support for changes to that format (which will have to be made
> over time) will be spotty at best.

again, if the format and the tools don't prove to be useful, right away and/or
in the long run, i suspect that there won't be a lot of people using it, correct?
so, ya'know, what's the big deal? lots of e-book formats have come and gone.
i've had a good time solving this little challenge. better than doing sudoku...

> When it comes to designing format converters, the key is the output format.
> If you choose an output format that is a superset of all input formats
> you might consider, it is then possible to use the converter to convert
> all input formats to a single output format. You do this by using
> a object model internally in the converter software, with plugins for input formats.
> And it them becomes easy to output to different formats using the object model.

and once again, i don't find that relevant to the work that i have done.
the work i _have_done_, as in _past_tense_, as in _already_completed_.

i'm not telling you that it's not useful for _you_, because it might well be.
but it's not useful for me. so you can make all the posts you want saying
_otherwise_, but you're not going to have an effect. that's all i'm saying...

> Starting with an output format that is more limited than possible input formats
> is simply ass-backwards. As I said before zml *might* be a good idea for
> conversion of txt files for p.g. but little else. And without an opensource
> converter from zml to html it is emphatically not a good idea.

look, i'm not telling anyone that they can't make an open-source converter
from .zml to .html. i've laid out a very simple spec, precisely so they _can_.
and i'll even help them if they run into any problems in attacking the task,
because i have done it, so i know how, and i believe they will find it to be as
simple and straightforward as i found it to be. i'll even give 'em a gold star,
providing they do the job right, in order to avoid confusion about the spec.
heck, if they do the thing correctly, i'll even host their converter on my site...

same goes for a converter to .pdf, or rocketbook, or mobipocket, or whatever.
if they don't, that's fine too, because i will. but anyone certainly _can_ do it...
heck, i'd even host a converter for .epub, just to show i got a generous heart,
if someone writes the silly thing.

and if someone wants to write a viewer-program, i'll help them do that as well.
and if they wanna make it closed-source, i don't care. i don't even care if they
charge people for it, since i'm giving away my viewer-program free of charge,
so if their app is so much better that someone will actually pay 'em for it, fine!
i'll even collect the darn money for them, because if they're making sales, then
it must be because they're doing _something_ right, and i want to reward them.

same goes with anyone else making any other programs that add value to .zml.

so, all in all, i'm not sure why you've got that bug up your butt... :+)

but i can take a good guess. because, like i said, i've gotten flak before...
there's a lot of technoids out there who've spent lots of time and energy
mastering the complexities of heavy-markup, and a simple system that
matches their benefits without imposing their costs is a threat to them...
it's a big threat to their expertise. so they attack me. but i'm very strong.
i have a thick skin, and i've been through it time and time and time again.

i went through _many_years_ of it over on the project gutenberg listserve.
unlike here, however, i didn't show my poker hand to people right away...
i just let them argue the points on a "theoretical" basis -- over and over --
so they ended up wagering all their credibility. over time, very gradually,
i introduced more and more evidence indicating that my system did work,
until now, when it's absolutely clear that they were wrong all along, they've
lost all of their credibility. so don't make the same mistake that they made.
there are plenty of holes in the in-progress proof-of-concept models that
i've made available. if you want to play this game, go and find those holes.
but if you wanna argue this on a "theoretical" basis, you'll lose to my demos.

as i'd tell my antagonists on the p.g. listserve, "the proof is in the pudding".
and i'm starting to dish out pudding. you can match me, or be left behind.

-bowerbird

bowerbird · 11-04-2007, 09:09 PM

jbenny said:
> And if you think you can get six figures for your system and code,
> then why are you over here, hassling everyone

i'm not "hassling" anyone. this thread was started as an inquiry into _beauty_...

but you set out to make it ugly. why? don't answer, just go, like you promised.

as for a 6-figure pricetag, that's cheap. mobipocket got _7_figures_ from amazon.

-bowerbird

Nate the great · 11-04-2007, 09:14 PM

Quote:

Originally Posted by bowerbird

jbenny said:
> And if you think you can get six figures for your system and code,
> then why are you over here, hassling everyone

i'm not "hassling" anyone. this thread was started as an inquiry into _beauty_...

but you set out to make it ugly. why? don't answer, just go, like you promised.

as for a 6-figure pricetag, that's cheap. mobipocket got _7_figures_ from amazon.

-bowerbird

I hope you aren't really comparing yourself to Mobipocket.

ebookie · 11-04-2007, 09:21 PM

I'm hesitant to join into this discussion

but since I've been thinking about some of these issues as well I figured I would put in a couple of comments.

First, the difference between semantics and presentation. So HTML (as a DTD of SGML) mixed these two with the notion that you were presenting documents in a browser of variable size. There is some notion of semantics (like H1 is a top level heading) and some notion of presentation (like B is boldface) and not a clear line between them. If the Project Gutenberg (PG) texts could be converted into something that identified just the semantics around the text then one could build formatter/presenters to "present" it on an electronic book.

Bowerbird's attempts are notable in that they attempt to embed semantics into a file as transparently as possible (which is a good goal if you might find yourself reading the file directly) but that feature makes it pretty challenging to screen automatically for errors. (For example if a bit flip causes the number 'M' (one bit different and <CR> in ASCII) to appear in one of the 5 lines between headers what does it do?) Does that screw up the presentation?

Now there is a standard way to solve this issue, its by using the stuff between SGML (very complicated) and HTML (very confused) called XML. Not XHTML but just XML. If the semantics of the book are automatically added into the PG text as XML tag pairs then three benefits will result:

1) An XML schema checker can validate that the semantics
are valid.
2) An XSLT style sheet can easily, and on the fly, convert the book
to ASCII, PostScript, HTML, Etc.
3) New style sheets can leverage existing annotated books to support
new formats.

Given the existing support for parsing and processing XML it would be straightforward (although perhaps not easy), to create a copy editing tool which sucked in a book, added its best guess at what the semantics were (and there is great work to leverage from the ZML work here) and then generate an annotated result. One might hope that all copy editors/proof readers can agree that something "Is a heading" without having to agree on how headings should be presented, or treated in the book presentation.

--Chuck

jbenny · 11-04-2007, 09:27 PM

Quote:

Originally Posted by Nate the great

I hope you aren't really comparing yourself to Mobipocket.

Ah, but he is. Delusions of grandeur. Fits with his other delusions. Along with his repeated avoidance of responding to actual comments, but only responding to his twisted interpretation of what he thinks was said.

kovidgoyal · 11-04-2007, 09:31 PM

Sigh this is a discussion about the merits of light weight markup, not an attack on you or your pet markup system. It's about trying to figure out whether spending time and effort on creating apps that support light weight markup is worth it.

1) Features not supported by light weight markup
- CSS float, boxes with custom borders, boxes with background colors for emphasis. Drop caps. I could go on.

2) I care because I am trying to drill into your thick head that light weight markup is not the best solution for ebooks.

3) Ditto.

1) If your tools are not open source you're not giving them to people you're giving people the ability to use them. A subtle, but important distinction.

2) Again the point of this discussion is to weigh the merits of light weight markup as a format for ebooks, not to decide whether you've spent your time wisely or not.

3) My concern was writing converters to zml not from zml. If you want to push zml as an ebook format, considering that there are currently no ebooks in zml you'd better worry about writing converters to zml not from zml.

Goshzilla · 11-04-2007, 09:57 PM

There is already a converter for Project Gutenberg texts, it's called GutenMark. It takes the plain txt files and spits them out into html, the paragraphs are formatted correctly and certain things like chapter headings are given a formatted heading to make the text stand out from the paragraph text.

This pretty much will only work well on Gutenberg texts because that was what the program was originally written in mind for.

Maybe I just don't understand what the original poster was getting at?

jbenny · 11-04-2007, 10:07 PM

Quote:

Originally Posted by Goshzilla

There is already a converter for Project Gutenberg texts, it's called GutenMark. It takes the plain txt files and spits them out into html, the paragraphs are formatted correctly and certain things like chapter headings are given a formatted heading to make the text stand out from the paragraph text.

This pretty much will only work well on Gutenberg texts because that was what the program was originally written in mind for.

Maybe I just don't understand what the original poster was getting at?

I already suggested GutenMark. He shot it down as inferior to his method. As for understanding what he is getting at, I don't think anyone can.

Goshzilla · 11-04-2007, 10:49 PM

Well after reading through the five pages, I'm extremely confused. Even more than when I naively suggested Gutenmark.

After having to manually make my own ebooks for Palm Reader format, PDF, and Ebookwise, Gutenmark is probably the best out there for doing that, otherwise it's alot of time wasted in Microsoft Word removing double lines and replacing with single ones, etc. etc.

Let me get this right though, the original poster is complaining about no formatting on Gutenberg Texts being converted straight into Microsoft Reader format? It just seems to me that it's a non-issue since using Gutenmark or doing the mass replace commands in Microsoft Word can allow for a readable Microsoft Reader edition.

There are even open source ebook readers that can do this job with no editing at all including the automatic line replacing and a quick table of contents.

Maybe I got all of this wrong too. This thread is getting more confusing with each post.

bowerbird · 11-04-2007, 11:32 PM

goshzilla said:
> There is already a converter for Project Gutenberg texts,
> it's called GutenMark.

yes, gutenmark converts e-texts into .html.
if it serves your needs, that's fine with me...
you can just move along to the next thread.

> Maybe I just don't understand what the original poster was getting at?

i'm the original poster. the original post was
to elicit discussion on the various ways that
people make the ugly p.g. e-texts beautiful...

that's just a small slice of my own total aims,
so gutenmark doesn't do the job i want to do,
but as i've said, i respect both it and its creator.

i've written a lot of words here about my aims,
over and above a simple conversion to .html,
but if it's still unclear to people, then just pass,
because it does _not_ matter to my aims if you
or anyone else understands them at this time...

> Maybe I got all of this wrong too.
> This thread is getting more confusing with each post.

some people thrive on complexity, and
they will introduce it unnecessarily... :+)

-bowerbird

bowerbird · 11-04-2007, 11:43 PM

for those who _do_ want to understand my aims, they're simple.

i've created a simple format authors can use to make e-books
which -- when displayed in my corresponding viewer-program --
have high-powered functionality and also render beautifully...

one example of the functionality that's provided _automatically_
is rich navigational links, beginning with the table of contents...

this same format will also be used for project gutenberg e-texts,
and i will convert the entire p.g. library to the format by myself.

one effect of this conversion of the p.g. library into my format
-- which is called "z.m.l.", short for "zen markup language" --
is that automatic conversions to other formats will be enabled,
specifically including .html, .pdf, and .ipod, and probably others.
to the greatest extent possible, these other formats will _also_
have the same high-powered functionality, like those auto-links.

another viewing option includes a web-viewer, now prototyped at:
> http://z-m-l.com/go/babelfish019.pl

-bowerbird

bowerbird · 11-04-2007, 11:45 PM

written this morning, which now seems like ages ago...

***

since this conversation has ranged so widely, i'm gonna
fill in a few patch spots (or fill in a few spotty patches?)
before we put this thread to bed. i hope that's ok...

and if anyone has more comments on the original topic
-- things that are being done to beautify p.g. e-texts --
then do please feel quite free to throw them in as well...

-bowerbird

bowerbird · 11-04-2007, 11:47 PM

first, a few things i forgot to mention on pagenumbers.

one very important aspects of pagenumber references
is that we need to consider them in our u.r.l. naming,
and the links there must have maximal transparency...

up above, i pointed you to these references:
> http://z-m-l.com/go/myant/myantp001.html
> http://z-m-l.com/go/mabie/mabiep001.html
> http://z-m-l.com/go/sgfhb/sgfhbp001.html

take the top one, and eliminate the first part, to get:
> myant/myantp001.html

you can see that the first 5 letters are repeated, so
eliminate those as well, and strip off the suffix, for:
> myantp001

in my naming, the first 5 letters reference one book.
in this case, it's "my antonia", the book by willa cather.

the "p001" part of the u.r.l. indicates this is page 1...

and just so you know, this u.r.l.:
> http://z-m-l.com/go/myant/myantp001.html
is based on the page-scan with this name:
> http://z-m-l.com/go/myant/myantp001.png
which, once again, is the page-scan for page 1.

and i rigorously follow this convention throughout.

so this is the u.r.l. for page 123:
> http://z-m-l.com/go/myant/myantp123.html

and it's based on the page-scan with this name:
> http://z-m-l.com/go/myant/myantp123.png

thus, any competent fourth-grader is capable of
figuring out the u.r.l. for _any_ page in this book.

furthermore, this means that when i encounter
some other p-book in the historical archive that
makes references to this edition of "my antonia",
i can relate those references to my e-book easily.

for instance, let's say that a passage runs like this:
> on page 189 and 198, cather ascribes qualities
> to antonia which seem to be inconsistent with
> those which were ascribed on page 15 and 83,
> and are completely contradictory to what cather
> clearly states on page 111. however, this could
> be due to the revelation which antonia has, that
> is described in detail on pages 144 and 157.

so, based on my transparent and consistent naming,
it's a simple exercise to create links for this passage:
> http://z-m-l.com/go/myant/myantp189.html
> http://z-m-l.com/go/myant/myantp198.html
> http://z-m-l.com/go/myant/myantp015.html
> http://z-m-l.com/go/myant/myantp083.html
> http://z-m-l.com/go/myant/myantp111.html
> http://z-m-l.com/go/myant/myantp144.html
> http://z-m-l.com/go/myant/myantp157.html

you would be _astonished_ how many cyberlibraries
have messed up their naming-schemes, such that a
simple plug-in-the-numbers strategy doesn't work.

google gets it kind-of right, but almost everyone else
gets it wrong, wrong, utterly and completely _wrong_.

and because of their confusing naming conventions,
scholars will have to go back and muddle through
_each_and_every_ reference like this, to find out how
the exact link for each one is specified in the e-book.
this is nothing less than sheer and massive stupidity...

-bowerbird

p.s. and, for the record, notice how completely useless
a p.g. e-text -- which was stripped of pagenumbers --
will be for a person who encounters the above passage.

kovidgoyal · 11-05-2007, 12:05 AM

@bowerbird
I notice you've once again produced a flood of verbiage and not bothered to answer any of my concrete points.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
The "Closed Circle" is open for business	pholy	Deals and Resources (No Self-Promotion or Affiliate Links)	0	12-20-2009 09:24 PM
"SuperBook" project - British School studies e-books usage	TadW	News	2	06-28-2007 10:46 PM
Introducing the book: Gutenberg offers "in-home" tech support (humor)	nekokami	Lounge	1	05-07-2007 08:40 PM
"Gutenberg 2.0: le futur du livre" / iRex demoes Mobipocket on iLiad	Hadrien	News	4	03-27-2007 11:45 AM

11-04-2007, 09:06 PM	#62
bowerbird Banned Posts: 269 Karma: -273 Join Date: Sep 2006 Location: los angeles	kovidgoyal said: > 1) Light markup has minimal features. you make that assertion, but you do absolutely nothing to support it. what features are found in books that are lacking from my test-suite? > 2) If authors use a GUI to generate ebooks, then they don't care about the markup, > which then negates your argument for lightweight markup from the perspective of authors. then it should be the case that i will have no users for my format. so let's see if that's what happens. if it is, no sweat off your nose. so why do you care? > 3) Lightweight markup is suitable for people who digitize books (like p.g.) > but not for people who create books, since people who digitize/convert books > typically don't care about advanced features, while people who create them do. once again, you seem to be awfully concerned about something that should pose absolutely no threat to you, if what you're saying is true. > Some new points: > 1) If you aren't open sourcing your code then good bye and good luck. > All you're doing then is defining a specification. Any 10 year old that > spends a week thinking about the requirements for an ebook format > could do that. you can say and think whatever you like. but, you know, it's my time, and i'm the one who decides how i spend it. and, for the people reading along, i'm doing much more than "defining a specification". i'm giving you tools to put that specification to work, turning plain-text files into e-books that are beautiful _and_ have superior functionality. whether you care or not, well, that's up to you... i don't expect everyone to care, maybe not anyone. > 2) Considering that you are designing a limited specification > with closed source authoring/viewing software > support for changes to that format (which will have to be made > over time) will be spotty at best. again, if the format and the tools don't prove to be useful, right away and/or in the long run, i suspect that there won't be a lot of people using it, correct? so, ya'know, what's the big deal? lots of e-book formats have come and gone. i've had a good time solving this little challenge. better than doing sudoku... > When it comes to designing format converters, the key is the output format. > If you choose an output format that is a superset of all input formats > you might consider, it is then possible to use the converter to convert > all input formats to a single output format. You do this by using > a object model internally in the converter software, with plugins for input formats. > And it them becomes easy to output to different formats using the object model. and once again, i don't find that relevant to the work that i have done. the work i _have_done_, as in _past_tense_, as in _already_completed_. i'm not telling you that it's not useful for _you_, because it might well be. but it's not useful for me. so you can make all the posts you want saying _otherwise_, but you're not going to have an effect. that's all i'm saying... > Starting with an output format that is more limited than possible input formats > is simply ass-backwards. As I said before zml might be a good idea for > conversion of txt files for p.g. but little else. And without an opensource > converter from zml to html it is emphatically not a good idea. look, i'm not telling anyone that they can't make an open-source converter from .zml to .html. i've laid out a very simple spec, precisely so they _can_. and i'll even help them if they run into any problems in attacking the task, because i have done it, so i know how, and i believe they will find it to be as simple and straightforward as i found it to be. i'll even give 'em a gold star, providing they do the job right, in order to avoid confusion about the spec. heck, if they do the thing correctly, i'll even host their converter on my site... same goes for a converter to .pdf, or rocketbook, or mobipocket, or whatever. if they don't, that's fine too, because i will. but anyone certainly _can_ do it... heck, i'd even host a converter for .epub, just to show i got a generous heart, if someone writes the silly thing. and if someone wants to write a viewer-program, i'll help them do that as well. and if they wanna make it closed-source, i don't care. i don't even care if they charge people for it, since i'm giving away my viewer-program free of charge, so if their app is so much better that someone will actually pay 'em for it, fine! i'll even collect the darn money for them, because if they're making sales, then it must be because they're doing _something_ right, and i want to reward them. same goes with anyone else making any other programs that add value to .zml. so, all in all, i'm not sure why you've got that bug up your butt... :+) but i can take a good guess. because, like i said, i've gotten flak before... there's a lot of technoids out there who've spent lots of time and energy mastering the complexities of heavy-markup, and a simple system that matches their benefits without imposing their costs is a threat to them... it's a big threat to their expertise. so they attack me. but i'm very strong. i have a thick skin, and i've been through it time and time and time again. i went through _many_years_ of it over on the project gutenberg listserve. unlike here, however, i didn't show my poker hand to people right away... i just let them argue the points on a "theoretical" basis -- over and over -- so they ended up wagering all their credibility. over time, very gradually, i introduced more and more evidence indicating that my system did work, until now, when it's absolutely clear that they were wrong all along, they've lost all of their credibility. so don't make the same mistake that they made. there are plenty of holes in the in-progress proof-of-concept models that i've made available. if you want to play this game, go and find those holes. but if you wanna argue this on a "theoretical" basis, you'll lose to my demos. as i'd tell my antagonists on the p.g. listserve, "the proof is in the pudding". and i'm starting to dish out pudding. you can match me, or be left behind. -bowerbird

11-04-2007, 09:09 PM	#63
bowerbird Banned Posts: 269 Karma: -273 Join Date: Sep 2006 Location: los angeles	jbenny said: > And if you think you can get six figures for your system and code, > then why are you over here, hassling everyone i'm not "hassling" anyone. this thread was started as an inquiry into _beauty_... but you set out to make it ugly. why? don't answer, just go, like you promised. as for a 6-figure pricetag, that's cheap. mobipocket got _7_figures_ from amazon. -bowerbird

11-04-2007, 09:21 PM	#65
ebookie Entrepreneur Posts: 36 Karma: 10 Join Date: Oct 2007 Location: California Device: Iliad v2	I'm hesitant to join into this discussion but since I've been thinking about some of these issues as well I figured I would put in a couple of comments. First, the difference between semantics and presentation. So HTML (as a DTD of SGML) mixed these two with the notion that you were presenting documents in a browser of variable size. There is some notion of semantics (like H1 is a top level heading) and some notion of presentation (like B is boldface) and not a clear line between them. If the Project Gutenberg (PG) texts could be converted into something that identified just the semantics around the text then one could build formatter/presenters to "present" it on an electronic book. Bowerbird's attempts are notable in that they attempt to embed semantics into a file as transparently as possible (which is a good goal if you might find yourself reading the file directly) but that feature makes it pretty challenging to screen automatically for errors. (For example if a bit flip causes the number 'M' (one bit different and <CR> in ASCII) to appear in one of the 5 lines between headers what does it do?) Does that screw up the presentation? Now there is a standard way to solve this issue, its by using the stuff between SGML (very complicated) and HTML (very confused) called XML. Not XHTML but just XML. If the semantics of the book are automatically added into the PG text as XML tag pairs then three benefits will result: 1) An XML schema checker can validate that the semantics are valid. 2) An XSLT style sheet can easily, and on the fly, convert the book to ASCII, PostScript, HTML, Etc. 3) New style sheets can leverage existing annotated books to support new formats. Given the existing support for parsing and processing XML it would be straightforward (although perhaps not easy), to create a copy editing tool which sucked in a book, added its best guess at what the semantics were (and there is great work to leverage from the ZML work here) and then generate an annotated result. One might hope that all copy editors/proof readers can agree that something "Is a heading" without having to agree on how headings should be presented, or treated in the book presentation. --Chuck

11-04-2007, 09:31 PM	#67
kovidgoyal creator of calibre Posts: 46,268 Karma: 29630732 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Sigh this is a discussion about the merits of light weight markup, not an attack on you or your pet markup system. It's about trying to figure out whether spending time and effort on creating apps that support light weight markup is worth it. 1) Features not supported by light weight markup - CSS float, boxes with custom borders, boxes with background colors for emphasis. Drop caps. I could go on. 2) I care because I am trying to drill into your thick head that light weight markup is not the best solution for ebooks. 3) Ditto. 1) If your tools are not open source you're not giving them to people you're giving people the ability to use them. A subtle, but important distinction. 2) Again the point of this discussion is to weigh the merits of light weight markup as a format for ebooks, not to decide whether you've spent your time wisely or not. 3) My concern was writing converters to zml not from zml. If you want to push zml as an ebook format, considering that there are currently no ebooks in zml you'd better worry about writing converters to zml not from zml.

11-04-2007, 09:57 PM	#68
Goshzilla Zealot Posts: 104 Karma: 346 Join Date: Oct 2007 Device: Rocket Ebook 1150	There is already a converter for Project Gutenberg texts, it's called GutenMark. It takes the plain txt files and spits them out into html, the paragraphs are formatted correctly and certain things like chapter headings are given a formatted heading to make the text stand out from the paragraph text. This pretty much will only work well on Gutenberg texts because that was what the program was originally written in mind for. Maybe I just don't understand what the original poster was getting at?

11-04-2007, 10:49 PM	#70
Goshzilla Zealot Posts: 104 Karma: 346 Join Date: Oct 2007 Device: Rocket Ebook 1150	Well after reading through the five pages, I'm extremely confused. Even more than when I naively suggested Gutenmark. After having to manually make my own ebooks for Palm Reader format, PDF, and Ebookwise, Gutenmark is probably the best out there for doing that, otherwise it's alot of time wasted in Microsoft Word removing double lines and replacing with single ones, etc. etc. Let me get this right though, the original poster is complaining about no formatting on Gutenberg Texts being converted straight into Microsoft Reader format? It just seems to me that it's a non-issue since using Gutenmark or doing the mass replace commands in Microsoft Word can allow for a readable Microsoft Reader edition. There are even open source ebook readers that can do this job with no editing at all including the automatic line replacing and a quick table of contents. Maybe I got all of this wrong too. This thread is getting more confusing with each post.

11-04-2007, 11:32 PM	#71
bowerbird Banned Posts: 269 Karma: -273 Join Date: Sep 2006 Location: los angeles	goshzilla said: > There is already a converter for Project Gutenberg texts, > it's called GutenMark. yes, gutenmark converts e-texts into .html. if it serves your needs, that's fine with me... you can just move along to the next thread. > Maybe I just don't understand what the original poster was getting at? i'm the original poster. the original post was to elicit discussion on the various ways that people make the ugly p.g. e-texts beautiful... that's just a small slice of my own total aims, so gutenmark doesn't do the job i want to do, but as i've said, i respect both it and its creator. i've written a lot of words here about my aims, over and above a simple conversion to .html, but if it's still unclear to people, then just pass, because it does _not_ matter to my aims if you or anyone else understands them at this time... > Maybe I got all of this wrong too. > This thread is getting more confusing with each post. some people thrive on complexity, and they will introduce it unnecessarily... :+) -bowerbird

11-04-2007, 11:43 PM	#72
bowerbird Banned Posts: 269 Karma: -273 Join Date: Sep 2006 Location: los angeles	for those who _do_ want to understand my aims, they're simple. i've created a simple format authors can use to make e-books which -- when displayed in my corresponding viewer-program -- have high-powered functionality and also render beautifully... one example of the functionality that's provided _automatically_ is rich navigational links, beginning with the table of contents... this same format will also be used for project gutenberg e-texts, and i will convert the entire p.g. library to the format by myself. one effect of this conversion of the p.g. library into my format -- which is called "z.m.l.", short for "zen markup language" -- is that automatic conversions to other formats will be enabled, specifically including .html, .pdf, and .ipod, and probably others. to the greatest extent possible, these other formats will _also_ have the same high-powered functionality, like those auto-links. another viewing option includes a web-viewer, now prototyped at: > http://z-m-l.com/go/babelfish019.pl -bowerbird

11-04-2007, 11:45 PM	#73
bowerbird Banned Posts: 269 Karma: -273 Join Date: Sep 2006 Location: los angeles	written this morning, which now seems like ages ago... *** since this conversation has ranged so widely, i'm gonna fill in a few patch spots (or fill in a few spotty patches?) before we put this thread to bed. i hope that's ok... and if anyone has more comments on the original topic -- things that are being done to beautify p.g. e-texts -- then do please feel quite free to throw them in as well... -bowerbird

11-04-2007, 11:47 PM	#74
bowerbird Banned Posts: 269 Karma: -273 Join Date: Sep 2006 Location: los angeles	first, a few things i forgot to mention on pagenumbers. one very important aspects of pagenumber references is that we need to consider them in our u.r.l. naming, and the links there must have maximal transparency... up above, i pointed you to these references: > http://z-m-l.com/go/myant/myantp001.html > http://z-m-l.com/go/mabie/mabiep001.html > http://z-m-l.com/go/sgfhb/sgfhbp001.html take the top one, and eliminate the first part, to get: > myant/myantp001.html you can see that the first 5 letters are repeated, so eliminate those as well, and strip off the suffix, for: > myantp001 in my naming, the first 5 letters reference one book. in this case, it's "my antonia", the book by willa cather. the "p001" part of the u.r.l. indicates this is page 1... and just so you know, this u.r.l.: > http://z-m-l.com/go/myant/myantp001.html is based on the page-scan with this name: > http://z-m-l.com/go/myant/myantp001.png which, once again, is the page-scan for page 1. and i rigorously follow this convention throughout. so this is the u.r.l. for page 123: > http://z-m-l.com/go/myant/myantp123.html and it's based on the page-scan with this name: > http://z-m-l.com/go/myant/myantp123.png thus, any competent fourth-grader is capable of figuring out the u.r.l. for _any_ page in this book. furthermore, this means that when i encounter some other p-book in the historical archive that makes references to this edition of "my antonia", i can relate those references to my e-book easily. for instance, let's say that a passage runs like this: > on page 189 and 198, cather ascribes qualities > to antonia which seem to be inconsistent with > those which were ascribed on page 15 and 83, > and are completely contradictory to what cather > clearly states on page 111. however, this could > be due to the revelation which antonia has, that > is described in detail on pages 144 and 157. so, based on my transparent and consistent naming, it's a simple exercise to create links for this passage: > http://z-m-l.com/go/myant/myantp189.html > http://z-m-l.com/go/myant/myantp198.html > http://z-m-l.com/go/myant/myantp015.html > http://z-m-l.com/go/myant/myantp083.html > http://z-m-l.com/go/myant/myantp111.html > http://z-m-l.com/go/myant/myantp144.html > http://z-m-l.com/go/myant/myantp157.html you would be _astonished_ how many cyberlibraries have messed up their naming-schemes, such that a simple plug-in-the-numbers strategy doesn't work. google gets it kind-of right, but almost everyone else gets it wrong, wrong, utterly and completely _wrong_. and because of their confusing naming conventions, scholars will have to go back and muddle through _each_and_every_ reference like this, to find out how the exact link for each one is specified in the e-book. this is nothing less than sheer and massive stupidity... -bowerbird p.s. and, for the record, notice how completely useless a p.g. e-text -- which was stripped of pagenumbers -- will be for a person who encounters the above passage.

11-05-2007, 12:05 AM	#75
kovidgoyal creator of calibre Posts: 46,268 Karma: 29630732 Join Date: Oct 2006 Location: Mumbai, India Device: Various	@bowerbird I notice you've once again produced a flood of verbiage and not bothered to answer any of my concrete points.