Page numbers in ebooks for scholarly research? - Page 5

bowerbird · 11-07-2007, 05:47 AM

my guess is that, of the 7 million volumes google will scan at umichigan,
just as one example, 99.98% of them will have pagenumbers in them...

those are the books that will form the cyberlibrary of the future, and thus
those are the books that we need to find a way to make _pointers_ into...

as pagenumbers have been the pointer-system used on them up until now,
we'll need to create digital means so that we can continue to support that,
and that infrastructure will allow us to continue using pagenumber pointers.

yes, we'll have other means too, but we'll need to make pagenumbers work.
luckily, as i believe i've shown in the examples i've posted, it's not too hard.

-bowerbird

HarryT · 11-07-2007, 08:16 AM

Quote:

Originally Posted by Patricia

I teach a fair amount of Plato and Aristotle and find online texts are a problem.When referring to Plato, it is essential to use Stephanus numbers, which will identify any sentence in his entire oeuvre. These appear as marginal numbers and letters in most print versions in both English and Greek. But the numbers simply don't appear in the online versions of Plato (except for the Perseus Project version). So I can't recommend them to students and don't use online versions myself.

(This is why I've never uploaded a Plato dialogue: without the Stephanus numbers it is useless to me. But with them it is irritating to general readers.)

The same problem exists with Latin and Greek poetry, Patricia. One always refers, for example, to Iliad, Book 8, line 204 and without the line numbers of the original a text is of very limited value.

NatCh · 11-07-2007, 11:15 AM

Quote:

Originally Posted by jbenny

As to predefined page sizes, that somewhat negates the benefits of a reflowable format and still ties us to the archaic concept of a page. Besides, you might read a particular epub on anything from a smart phone, to a 22 inch widescreen monitor. When you add in the different font sizes that might be used for reading, it adds up to a lot of combinations. And if only a certain subset of these combinations was in the specification, that also limits what you can do with a reflowable format.

I see what you're saying, jbenny, and I agree. However, I was suggesting that we define a single page size/margin/font combination and use it for all references, which would get around having to handle multiple ones.

This morning, however, some of the overnight comments, have got me hinking maybe we're making this too complicated.

We're talking about computers here, and computers do boring, repetitive functions fast and without complaint. Why not have the reading application generate some sort of text index? It could be as simple as a straight character count (which would get ... rather large), or it could be some sort of graduated count by chapter and then paragraph and then character. For instance, 10.3.400-475 would be chapter 10, paragraph 3 starting at character 400, running to character 475.

I'm not pushing for that specifically, just making a "top of my head" example.

The important bit is that it be an agreed upon standard, and that it be repeatable. The reading app can generate the reference and locate the point in the text from the reference. Of course, those needs will have to be met whatever the eventual system ends up being.

NatCh · 11-07-2007, 11:47 AM

Quote:

Originally Posted by bowerbird

besides, i think "criticizing" is a _lot_ better way to get to the bottom of a topic than blowing sunshine up someone's behind. don't you?

When the comment is little more than distilled sarcasm, with no actual content, it's not criticism, in my book, It comes closer to sniping.

But then, I don't regard discussing and exploring solutions in a respectful manner to be "blowing sunshine up someone's behind" either. I guess I gave up the personal illusion that I could give the Final, Infallible, and Only Answer on sweeping matters some time ago.

One of the side-effects of discussing things politely, and respectfully, even when the discussers disagree, is that people continue to consider what's being said, and don't skip, blow off, or otherwise Ignore comments by people who discuss things in such a fashion.

Having the best point in the world, or being absolutely right is pretty worthless if no one will listen. And if no one listens because they're tired of the tone the commenter takes with those who disagree with him is really rather sad.

Quote:

Originally Posted by bowerbird

plus, as if it is the case that "the only thing" i am doing is "criticizing". i invite anyone to take a look at the 3 posts of mine that i linked to. you will find more meat in them than in this "new thread" combined...

You're referring to these, I believe:

Quote:

Originally Posted by bowerbird

panurge, i feel where you're coming from. but let me run through a few thoughts.

so first, point #14 is about the embedding of pagenumbers inside of the text flow.
that's not a good idea, because they're a distraction that just needs to be removed
when we want to copy the text out for remixing. that's why point #14 is there.

my next comment -- which i say because it must be said -- is that it's not our job
to do your job. if the pagenumbers are valuable to you, it's your job to save them.
i'm sorry if that sounds cold, but that's the way it is.

having said that, however, let me move on to my next comment, which is that
i am in 100% agreement with you. even though pagenumbers are _irrelevant_,
in many senses, when we move a book to the digital sphere, i'm convinced that
we still need to retain pagenumber information, simply because so much of our
archival history uses pagenumbers as pointer-information. we cannot afford to
sacrifice that. indeed, i go one step further and argue that we should also be
retaining the _linebreak_information_ from all the paper-books that we digitize.
i won't go into all the arguments here, but in my mind, the answer is now clear.

furthermore, i put my money where my mouth is. in my digitization examples,
i maintain linebreaks and pagebreaks, and put the image-scan up next to the text,
so the end-user can verify the accuracy of my digitization if they want to do that.
i consider this checking by end-users to be the last fine line of the proofing process,
and i want them to feel like a part of the "march to perfection" that the text makes,
because i believe we need to make the public feel like "joint owners" of these books.
"the public domain belongs to _you_, the public, and you have responsibility for them,
so if there are errors here, you need to fill out an error-report so they are corrected."

to see some of my examples, check these out:
> http://z-m-l.com/go/myant/myantp001.html
> http://z-m-l.com/go/mabie/mabiep001.html
> http://z-m-l.com/go/sgfhb/sgfhbp001.html

you can thumb through these e-books just like they were the p-books,
and verify that the linebreaks and pagebreaks are exactly as they were.
and if you find an error, you can fill out an error-report right on the page.
and once someone has made a report, it's immediately visible to everyone,
even if it might take an administrator a little bit of time to fix the error...

now examine the plain-text versions of the files that created those books above:
> http://z-m-l.com/go/myant/myant.zml
> http://z-m-l.com/go/mabie/mabie.zml
> http://z-m-l.com/go/sgfhb/sgfhb.zml

you'll see how the pagebreak information was recorded in those plain-text files.
i think you'll also see how easily that pagebreak information can be eliminated,
for the situations where an end-user doesn't care about the original pagebreaks.

this is the kind of flexibility we want from our digitization efforts, so each group
gets the information they like, without inconveniencing what another group gets.

what is also useful about this format is that it's extremely close to what we get
_naturally_ when we scan a book, so it's not hard to go from scan output to final.

now, having said all _that_, let me proceed to my final point, which is a variant
on the "don't expect us to do your job for you". and it is _not_ our job to make
"a faithful representation of the print copy". we don't even _want_ to do that --
even if we could -- and we _cannot_, because any time you move a document
from one medium to a completely different one, you're creating a new edition.
whether you mean to do it or not. and like i said, at least from my perspective,
i don't even think twice about things like the correcting of typos. heck, i'll even
rework headers -- or even the _body_ of the text -- if that is what it takes to
make this _digital_version_ a _good_ digital version. i'm a republisher, who is
moving this book into a new medium for a new world in a new century, and
i'm going to do justice to the new. it's simply not my job to snapshot the old.
if you want to see what the old pages looked like, you can look at the scans.

so, anyway, there's some feedback for you to think about... :+)

-bowerbird

Quote:

Originally Posted by bowerbird

first, a few things i forgot to mention on pagenumbers.

one very important aspects of pagenumber references
is that we need to consider them in our u.r.l. naming,
and the links there must have maximal transparency...

up above, i pointed you to these references:
> http://z-m-l.com/go/myant/myantp001.html
> http://z-m-l.com/go/mabie/mabiep001.html
> http://z-m-l.com/go/sgfhb/sgfhbp001.html

take the top one, and eliminate the first part, to get:
> myant/myantp001.html

you can see that the first 5 letters are repeated, so
eliminate those as well, and strip off the suffix, for:
> myantp001

in my naming, the first 5 letters reference one book.
in this case, it's "my antonia", the book by willa cather.

the "p001" part of the u.r.l. indicates this is page 1...

and just so you know, this u.r.l.:
> http://z-m-l.com/go/myant/myantp001.html
is based on the page-scan with this name:
> http://z-m-l.com/go/myant/myantp001.png
which, once again, is the page-scan for page 1.

and i rigorously follow this convention throughout.

so this is the u.r.l. for page 123:
> http://z-m-l.com/go/myant/myantp123.html

and it's based on the page-scan with this name:
> http://z-m-l.com/go/myant/myantp123.png

thus, any competent fourth-grader is capable of
figuring out the u.r.l. for _any_ page in this book.

furthermore, this means that when i encounter
some other p-book in the historical archive that
makes references to this edition of "my antonia",
i can relate those references to my e-book easily.

for instance, let's say that a passage runs like this:
> on page 189 and 198, cather ascribes qualities
> to antonia which seem to be inconsistent with
> those which were ascribed on page 15 and 83,
> and are completely contradictory to what cather
> clearly states on page 111. however, this could
> be due to the revelation which antonia has, that
> is described in detail on pages 144 and 157.

so, based on my transparent and consistent naming,
it's a simple exercise to create links for this passage:
> http://z-m-l.com/go/myant/myantp189.html
> http://z-m-l.com/go/myant/myantp198.html
> http://z-m-l.com/go/myant/myantp015.html
> http://z-m-l.com/go/myant/myantp083.html
> http://z-m-l.com/go/myant/myantp111.html
> http://z-m-l.com/go/myant/myantp144.html
> http://z-m-l.com/go/myant/myantp157.html

you would be _astonished_ how many cyberlibraries
have messed up their naming-schemes, such that a
simple plug-in-the-numbers strategy doesn't work.

google gets it kind-of right, but almost everyone else
gets it wrong, wrong, utterly and completely _wrong_.

and because of their confusing naming conventions,
scholars will have to go back and muddle through
_each_and_every_ reference like this, to find out how
the exact link for each one is specified in the e-book.
this is nothing less than sheer and massive stupidity...

-bowerbird

p.s. and, for the record, notice how completely useless
a p.g. e-text -- which was stripped of pagenumbers --
will be for a person who encounters the above passage.

Quote:

Originally Posted by bowerbird

in that x.m.l.-based version of "my antonia" i discussed above,
i forgot to provide an example of a link direct to a paragraph.

here's one:
> http://www.openreader.org/myantonia/...nia.html#p0251
you should read the paragraph directly after that one as well...

-bowerbird

Part of the reason those posts are not getting much exposure here may be this:

Quote:

Originally Posted by bowerbird

i see no reason for a new thread, and won't repeat my posts here:

I find the sentiment expressed in that comment particularly ironic, knowing, as I do, that the thread you referenced is one created specifically for the purpose of pulling an interesting topic that you brought up in yet another thread out where it could get the exposure it seemed to deserve. I'd've done the same thing for Panurge's topic (even though it involves a bit of trouble to do), if he hadn't beaten me to the punch, so to speak.

In any case, now that the posts in question are here where the discussion is continuing, others may find in them points worth responding to.

nekokami · 11-07-2007, 01:15 PM

As a doctoral student, I'm pretty much stuck with having to reference printed page numbers, but I'd like to see a transition to paragraph numbers in the future, to better support electronic reflowable documents. I think we'll have to support both for the foreseeable future, to allow references to pre-electronic documents, even those that have been converted to digital form. Some kind of embedded semantic tagging for each of these methods of identifying text location that can be shown or hidden at will would be great.

sartori · 11-07-2007, 01:15 PM

Natch, thanks for bringing over the info from the other thread (I kind of gave up reading that thread after the 'debates' started.) After reading through the post above I have a question.

Bowerbird, can i ask the reasoning behind splitting the document into individual pages? Couldn't you point to the page content using http://z-m-l.com/go/myant/myantp.html#189 as opposed to http://z-m-l.com/go/myant/myantp189.html. That way the whole content of the book is in one file and conversion to other formats would be easier. For example how do you recognize when a paragraph splits across two pages and how do you join them back together when converting? You might have a good reason that I haven't considered so I would like to hear your take on it.

bowerbird · 11-07-2007, 03:01 PM

natch said:
> When the comment is little more than distilled sarcasm,
> with no actual content, it's not criticism, in my book,
> It comes closer to sniping.

"with no actual content"?

did you not get the content in that post of mine?

if so, then let me explain it to you a little bit more directly...

_lots_ of people have already spent _lots_ of time and energy
thinking about these questions, running up solutions, and
actually putting _even_more_ of their own time and energy
to code experimental solutions so that they could be tested.

the results have largely confirmed what most of us suspected,
namely that there is no reliable way to point to a piece of info
if someone (else) has the ability to change that info any time,
up to and including the option of completely _removing_ it...

because, hey, it's hard to point to something that ain't there.

a fact which -- in retrospect -- seems to be fairly "obvious",
and which might have been a tip-off from the very beginning
that maybe this was one of those problems with no solution...

because, realistically_, that _is_ the situation which we're in.
someone (else) _is_ going to have control over the info that
we want to point to. it's called copyright, and it's our burden.

furthermore, when someone here suggests that the people
over at i.p.d.f. should pay some attention to this question,
that implies that i.p.d.f. has _not_ paid any attention to it...

when the fact of the matter is that they _have_. they've paid
more attention than you know, including enough attention
to understand (which y'all here don't seem to have grasped)
that this is one of those problems with no solution, or at least
no "really good solution".

so to imply that they "need to consider this" is _stupid_...

so here's my choice. i can either use a little bit of sarcasm,
which -- last i checked -- is considered a form of _humor_
(albeit not as happy-go-lucky and feel-good as slapstick),
or i can instead go for the "explain everything to them like
they were a bunch of second-graders, and let the fact that
they've ignored some basic reality give the solid impression
that they're not just second-graders, but kinda stupid ones,
even though that ain't the impression i _want_ to leave...".

i went for the form of humor. was that a mistake?

-bowerbird

bowerbird · 11-07-2007, 04:23 PM

sartori said:
> Bowerbird, can i ask the reasoning behind
> splitting the document into individual pages?

first of all, my e-books _can_ exist in several forms.
the individual-pages form is just one of those forms.
but i can (and do) spin out "whole-book" forms too...
(plus chapter-by-chapter forms, for some purposes.)

i pointed to the page-by-page form because this topic
-- page-based referencing within scholarly situations --
is one whose basic requirements call out for that form...

to see the "master file" for "my antonia", look here:
> http://z-m-l.com/go/myant/myant.zml
(as you see, the master itself is in whole-book form.)

that "master" _generated_ the page-by-page form...

the page-by-page form has many intended purposes.

its first major purpose is to facilitate _proofreading_...
you want to do proofreading on a page-by-page basis;
you want the page-scan to be shown alongside the text;
and you want the text to contain the original linebreaks.
this format is geared toward those proofreading needs...
(this is a "final-stage" proofing interface, where errors are
"reported", because there are very few. for earlier stages
of proofreading, where there might be many more errors,
we'll want an interface that lets us fix them more directly.)

the next major purpose of it is for _confirming_accuracy_.
we want to give people an ability to confirm our digitization,
to satisfy themselves we did that conversion job correctly...
to do so, we show them our text and the original page-scan,
so they can do a direct comparison and see for themselves...

the third major purpose is the one we're discussing here --
the ability for people to make a pointer to a specific page...
and -- as i have said -- the reason we need to facilitate that
is because our culture heritage is full of page-based pointers.
and again, we _could_ point them to a place with just the text,
but everyone knows that text can be easily "edited", so we also
put the original page-scan up so as to increase the trust factor.
(of course, scans could _also_ be doctored, but at some point,
there's only so much you can do.)

> Couldn't you point to the page content using
> http://z-m-l.com/go/myant/myantp.html#189
> as opposed to
> http://z-m-l.com/go/myant/myantp189.html.

sure.

and sometimes that's what you'll want to do instead.

but let me show you something. stopwatch this link:
> http://z-m-l.com/go/myant/myantp189.html

now check the length of time it takes to go to this one:
> http://www.openreader.org/myantonia/...a.html#page189

unless that second page was already in your cache or
you have a superfast connection, it took _lots_ longer
to load, because you're loading in some 500k of text
-- the whole book -- instead of 1k of text and a scan.
(for the dialup users, the second file will be _painful_.)

so it depends on what you need your readers to load...
if you only need them to load one page of text, do that.
if you need them to load the whole book, then do _that_.

you'll notice that the second link doesn't include the scans
in-line in the file; you have to click a link to view each one.
(the scans run to 30 megs, so it'd be suicide to load 'em all.)

so it depends on what you need.

if you wanted to point to one page in each of 50 books,
you wouldn't want to force your reader to load each of the
50 books in full just to see that one page. and this is often
the essence of a scholarly reference section. so it depends.

this is why we need the flexibility to quickly and easily
auto-generate whatever format is needed at the time...

> That way the whole content of the book is in one file
> and conversion to other formats would be easier.

in sum, i pointed to a page-based form because of this discussion...
i can also create book-based forms when _that_ is more appropriate.

(such flexibility is one reason i invented my z.m.l. format, which is
a sidetrack topic in that other thread from which this one came...)

> For example how do you recognize when a paragraph splits across
> two pages and how do you join them back together when converting?

good question. but easy answer.

in a "master" file which has pagebreaks marked, like this one:
> http://z-m-l.com/go/myant/myant.zml
the formula for generating a version _without_ the pagebreak info is to:
1. delete the _one_ blank line _above_ the [[doublebracketed]] pagenumber,
and delete the _one_ blank line _below_ the {{doublebraced}} scan-filename...
2. if there were _two_ blank lines above and below, respectively, then that
was a paragraph break, so you should insert a blank line in the output file.

if you follow that rule, you'll find that paragraphs which cross pagebreaks
get joined together, while the ones that ended on the pagebreak still do...
for instance, in the .zml master, compare the breaks between these pages:
> http://z-m-l.com/go/myant/myantp040.html
> http://z-m-l.com/go/myant/myantp041.html
versus:
> http://z-m-l.com/go/myant/myantp061.html
> http://z-m-l.com/go/myant/myantp062.html

see how easy it was for me to point you to those pages specifically?
and also the _usefulness_ of being able to see both text _and_ scan?

-bowerbird

NatCh · 11-07-2007, 04:46 PM

Quote:

Originally Posted by bowerbird

so here's my choice. i can either use a little bit of sarcasm, ... or i can instead go for the "explain everything to them like they were a bunch of second-graders, and let the fact that they've ignored some basic reality give the solid impression that they're not just second-graders, but kinda stupid ones, even though that ain't the impression i _want_ to leave...".

There's another choice, bowerbird. Talk to people like they're actually functionally intelligent, and point out the point you feel they're missing, without sarcasm or abrasive phrasing, and describe the implications of that point as you see the in a similarly non-sarcastic, non-abrasive manner.

Sarcasm is indeed used humorously on the forum a great deal, but it has to be well telegraphed as humor, because things like tone of voice don't come through in text without a good deal of effort, and they can easily be taken the wrong way. Because of that it also requires a willingness to step back from it and clarify what was meant when it doesn't come across as funny, even to the point of apologizing for giving offense that was never intended.

You come across as seeming to consider anyone who doesn't see things your way to be an imbecile, and people are starting to assume that you mean to be abrasive even when you don't. I've noticed this, but if you have, you have given no sign of it.

You have managed to get more folks to put you on ignore in a week than I've seen happen in the preceding almost two years that I've been around MR.

These are the results of the absence of the respect for which you have expressed such scorn: you are driving folks away even as you claim to wish to persuade them.

I, and several others have put significant amounts of effort in attempting to communicate this to you, but you seem to regard those efforts as aimed at getting you to shut up -- if the moderators here wanted to stifle you as you seem to believe we do, we wouldn't have resorted to talking to you to do so. The fact that we have ought to tell you something all by itself.

I've reached the point where I simply don't know what else to say to you.

bowerbird · 11-07-2007, 05:07 PM

natch said:
> Talk to people like they're actually functionally intelligent

i do! whenever people strike me as being "functionally intelligent".

in addition, if they strike me as being stupid, i talk to them like that.

but people who want me to talk to them as "functionally intelligent"
when they're holding up their end of the conversation with stupidity,
they get _sarcasm_ from me. because that's the best they _deserve_.

> even to the point of apologizing
> for giving offense that was never intended.

did you _intend_ to offend me with this sentence?

or with your post as a whole?

more importantly are you ready to _apologize_ for doing so?

> you are driving folks away even as you claim to wish to persuade them.

hold it there. i never said i "wish to persuade" _anyone_ of _anything_.
in fact, i expressly disclaim that as an intention, wholly and completely.
frankly, i don't care what anyone thinks, if they disagree or agree with me.
i speak my mind, and you can make of it whatever you wish, fine by me...

> You have managed to get more folks to put you on ignore in a week than
> I've seen happen in the preceding almost two years that I've been around MR.

some people don't want to hear anyone else speak frankly. so what?
others take offense much too easily, especially the insecure. so what?

i too ignore a lot of what i read here, because it has very little truth value.
it doesn't make sense. when i weigh it as evidence, it registers no mass...
i don't bother to filter out what people say, because i've found that it's not
generally a good idea to stick my head in the sand, but if other people want
to stick their head in the sand, i'm totally fine with that. indeed, i would prefer
that people put me on "ignore" than try to chastise me for speaking my truth.
i'm not "rude". i'm a gentle soul who believes in truth, and has enough respect
for my fellow human beings to be honest with them when they're being stupid,
honest enough to tell them directly. if you think that's a bad thing, i suggest
that you too put me on "ignore", so my words will magically be turned into
white space and you live in ignorant bliss. sincerely, i want you to be happy.

-bowerbird

sartori · 11-07-2007, 05:18 PM

Bowerbird,

Your reasoning makes sense to me (In response to my question). So some more questions if you don't mind

When you receive an error notification do you just update the master file then regenerate the paged version? or vice-versa? or for small updates do you just make the change in both versions?

On page 61 (http://z-m-l.com/go/myant/myantp061.html) I noticed that a few words are hyphenated across lines. On your master view the words are correctly joined (tea-kettle & followed). Were these manually corrected or automated? If automated did it correctly catch tea-kettle should keep its' hyphen?

I'm not sure if z.m.l. is the way I want to go with my formatting but I'm still at the early stages of formatting so I'm just checking out options (googling for ebook markup languages is hopeless as you just get a ton of responses that are actual ebooks).

Thanks,

rob

bowerbird · 11-07-2007, 05:57 PM

sartori said:
> So some more questions if you don't mind

i don't mind a bit. that's why i'm here, to discuss...

> When you receive an error notification do you just
> update the master file then regenerate the paged version?
> or vice-versa? or for small updates
> do you just make the change in both versions?

if you go to the directory now, you'll see a bunch of files:
> http://z-m-l.com/go/myant/
including all of the .html files to which i've been linking...
the .html files were generated in a batch from the master.

but eventually, all the separate .html files will disappear.

they'll be replaced by a script which intercepts links like this:
> http://z-m-l.com/go/myant/myantp061.html
and creates that .html file on-the-fly...

so yes, any correction will be made to the master, after which
the script will include it when it builds the .html file next time.

> On page 61 (http://z-m-l.com/go/myant/myantp061.html)
> I noticed that a few words are hyphenated across lines.
> On your master view the words are correctly joined
> (tea-kettle & followed).

um, as far as i can tell, you're mistaken. here's the master:
> http://z-m-l.com/go/myant/myant.zml

what i see there, in the master, is this:
> Peter shuffled to his feet, caught up the tea-
> kettle and mixed him some hot water and
> whiskey. The sharp smell of spirits went
> through the room.
>
> Pavel snatched the cup and drank, then
> made Peter give him the bottle and slipped
> it under his pillow, grinning disagreeably,
> as if he had outwitted some one. His eyes fol-
> lowed Peter about the room with a contempt-
> uous, unfriendly expression. It seemed to
> me that he despised him for being so simple
> and docile.

do you really see something different? if so, that's a mystery...

> Were these manually corrected or automated?
> If automated did it correctly catch tea-kettle should keep its' hyphen?

not all of the example-files that i have up are correct on this point yet,
but they'll be marked as to whether an end-line hyphen is kept or not...

so, if "tea-kettle" -- with the dash -- is the form used in this book
(when the word is mid-sentence), then the master will look like this:
> Peter shuffled to his feet, caught up the tea-@
> kettle and mixed him some hot water and
(i haven't decided if we'll use the at-sign, but you get the idea.)

on the other hand, if this book uses "teakettle", the master will say:
> Peter shuffled to his feet, caught up the tea-
> kettle and mixed him some hot water and

(for the record, this book does indeed use "tea-kettle" in the one
other instance where the word occurs. in the cases where there is
no other use of an end-line hyphenate, we consult the dictionary.
when there is inconsistency within a book, we edit to consistency.)

> I'm not sure if z.m.l. is the way I want to go with my formatting but
> I'm still at the early stages of formatting so I'm just checking out options

i definitely suggest light-markup. "markdown" is the current favorite,
if you want broad support. my tool-change is approaching coherence,
so you could get the job done, but markdown gives you more reliability.
google "showdown" and "markdown" for an interesting real-time demo:
> http://www.attacklab.net/showdown-gui.html

-bowerbird

sartori · 11-07-2007, 06:09 PM

Bowerbird - sorry didn't mean the master I meant the html view that you listed.

I like the showdown stuff - seems a little limited as far as layout but it looks really easy to use.

Thanks.

bowerbird · 11-07-2007, 06:29 PM

sartori said:
> I like the showdown stuff - seems a little limited as far as layout

depends on what you want to do with a book,
and what platforms you want to put it out to...

you can often exercise tight control in _one_ setting,
but then it blows up on you when you try to move it...

a good rule of thumb is that if you cannot do it with
light-markup, then you shouldn't be doing it anyway,
because it's not gonna convert well to other settings.

so living with some "limitations" from the beginning
can save you a truckload of heartburn done the road.

but, you know, your demo showed you've got chops...
so i'd encourage you to let your mind experiment fully.

-bowerbird

bowerbird · 11-07-2007, 06:32 PM

sartori sadi:
> sorry didn't mean the master I meant the html view that you listed.

except i still don't follow.
the individual-page .html file shows end-line hyphenates just like the scan:
> http://z-m-l.com/go/myant/myantp061.html

-bowerbird

11-07-2007, 05:07 PM	#70
bowerbird Banned Posts: 269 Karma: -273 Join Date: Sep 2006 Location: los angeles	natch said: > Talk to people like they're actually functionally intelligent i do! whenever people strike me as being "functionally intelligent". in addition, if they strike me as being stupid, i talk to them like that. but people who want me to talk to them as "functionally intelligent" when they're holding up their end of the conversation with stupidity, they get _sarcasm_ from me. because that's the best they _deserve_. > even to the point of apologizing > for giving offense that was never intended. did you _intend_ to offend me with this sentence? or with your post as a whole? more importantly are you ready to _apologize_ for doing so? > you are driving folks away even as you claim to wish to persuade them. hold it there. i never said i "wish to persuade" _anyone_ of _anything_. in fact, i expressly disclaim that as an intention, wholly and completely. frankly, i don't care what anyone thinks, if they disagree or agree with me. i speak my mind, and you can make of it whatever you wish, fine by me... > You have managed to get more folks to put you on ignore in a week than > I've seen happen in the preceding almost two years that I've been around MR. some people don't want to hear anyone else speak frankly. so what? others take offense much too easily, especially the insecure. so what? i too ignore a lot of what i read here, because it has very little truth value. it doesn't make sense. when i weigh it as evidence, it registers no mass... i don't bother to filter out what people say, because i've found that it's not generally a good idea to stick my head in the sand, but if other people want to stick their head in the sand, i'm totally fine with that. indeed, i would prefer that people put me on "ignore" than try to chastise me for speaking my truth. i'm not "rude". i'm a gentle soul who believes in truth, and has enough respect for my fellow human beings to be honest with them when they're being stupid, honest enough to tell them directly. if you think that's a bad thing, i suggest that you too put me on "ignore", so my words will magically be turned into white space and you live in ignorant bliss. sincerely, i want you to be happy. -bowerbird Last edited by bowerbird; 11-07-2007 at 06:36 PM. Reason: because somebody messed with my formatting...

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Page numbers	Fincary	Astak EZReader	4	02-18-2010 03:06 PM
page numbers	nenad	Amazon Kindle	2	12-19-2009 09:01 AM
Professional and scholarly ebooks account for 75% of ebook market?	anurag	News	1	11-26-2009 12:40 PM
Page numbers, AGAIN	orlincho	Bookeen	92	08-19-2008 07:15 AM
Page numbers (again)	Prospect	Workshop	50	04-10-2008 02:19 AM

11-07-2007, 05:47 AM	#61
bowerbird Banned Posts: 269 Karma: -273 Join Date: Sep 2006 Location: los angeles	my guess is that, of the 7 million volumes google will scan at umichigan, just as one example, 99.98% of them will have pagenumbers in them... those are the books that will form the cyberlibrary of the future, and thus those are the books that we need to find a way to make _pointers_ into... as pagenumbers have been the pointer-system used on them up until now, we'll need to create digital means so that we can continue to support that, and that infrastructure will allow us to continue using pagenumber pointers. yes, we'll have other means too, but we'll need to make pagenumbers work. luckily, as i believe i've shown in the examples i've posted, it's not too hard. -bowerbird

11-07-2007, 01:15 PM	#65
nekokami fruminous edugeek Posts: 6,745 Karma: 551260 Join Date: Oct 2006 Location: Northeast US Device: iPad, eBw 1150	As a doctoral student, I'm pretty much stuck with having to reference printed page numbers, but I'd like to see a transition to paragraph numbers in the future, to better support electronic reflowable documents. I think we'll have to support both for the foreseeable future, to allow references to pre-electronic documents, even those that have been converted to digital form. Some kind of embedded semantic tagging for each of these methods of identifying text location that can be shown or hidden at will would be great.

11-07-2007, 01:15 PM	#66
sartori Connoisseur Posts: 54 Karma: 29 Join Date: Oct 2006	Natch, thanks for bringing over the info from the other thread (I kind of gave up reading that thread after the 'debates' started.) After reading through the post above I have a question. Bowerbird, can i ask the reasoning behind splitting the document into individual pages? Couldn't you point to the page content using http://z-m-l.com/go/myant/myantp.html#189 as opposed to http://z-m-l.com/go/myant/myantp189.html. That way the whole content of the book is in one file and conversion to other formats would be easier. For example how do you recognize when a paragraph splits across two pages and how do you join them back together when converting? You might have a good reason that I haven't considered so I would like to hear your take on it.

11-07-2007, 03:01 PM	#67
bowerbird Banned Posts: 269 Karma: -273 Join Date: Sep 2006 Location: los angeles	natch said: > When the comment is little more than distilled sarcasm, > with no actual content, it's not criticism, in my book, > It comes closer to sniping. "with no actual content"? did you not get the content in that post of mine? if so, then let me explain it to you a little bit more directly... _lots_ of people have already spent _lots_ of time and energy thinking about these questions, running up solutions, and actually putting _even_more_ of their own time and energy to code experimental solutions so that they could be tested. the results have largely confirmed what most of us suspected, namely that there is no reliable way to point to a piece of info if someone (else) has the ability to change that info any time, up to and including the option of completely _removing_ it... because, hey, it's hard to point to something that ain't there. a fact which -- in retrospect -- seems to be fairly "obvious", and which might have been a tip-off from the very beginning that maybe this was one of those problems with no solution... because, realistically_, that _is_ the situation which we're in. someone (else) _is_ going to have control over the info that we want to point to. it's called copyright, and it's our burden. furthermore, when someone here suggests that the people over at i.p.d.f. should pay some attention to this question, that implies that i.p.d.f. has _not_ paid any attention to it... when the fact of the matter is that they _have_. they've paid more attention than you know, including enough attention to understand (which y'all here don't seem to have grasped) that this is one of those problems with no solution, or at least no "really good solution". so to imply that they "need to consider this" is _stupid_... so here's my choice. i can either use a little bit of sarcasm, which -- last i checked -- is considered a form of _humor_ (albeit not as happy-go-lucky and feel-good as slapstick), or i can instead go for the "explain everything to them like they were a bunch of second-graders, and let the fact that they've ignored some basic reality give the solid impression that they're not just second-graders, but kinda stupid ones, even though that ain't the impression i _want_ to leave...". i went for the form of humor. was that a mistake? -bowerbird

11-07-2007, 04:23 PM	#68
bowerbird Banned Posts: 269 Karma: -273 Join Date: Sep 2006 Location: los angeles	sartori said: > Bowerbird, can i ask the reasoning behind > splitting the document into individual pages? first of all, my e-books _can_ exist in several forms. the individual-pages form is just one of those forms. but i can (and do) spin out "whole-book" forms too... (plus chapter-by-chapter forms, for some purposes.) i pointed to the page-by-page form because this topic -- page-based referencing within scholarly situations -- is one whose basic requirements call out for that form... to see the "master file" for "my antonia", look here: > http://z-m-l.com/go/myant/myant.zml (as you see, the master itself is in whole-book form.) that "master" _generated_ the page-by-page form... the page-by-page form has many intended purposes. its first major purpose is to facilitate _proofreading_... you want to do proofreading on a page-by-page basis; you want the page-scan to be shown alongside the text; and you want the text to contain the original linebreaks. this format is geared toward those proofreading needs... (this is a "final-stage" proofing interface, where errors are "reported", because there are very few. for earlier stages of proofreading, where there might be many more errors, we'll want an interface that lets us fix them more directly.) the next major purpose of it is for _confirming_accuracy_. we want to give people an ability to confirm our digitization, to satisfy themselves we did that conversion job correctly... to do so, we show them our text and the original page-scan, so they can do a direct comparison and see for themselves... the third major purpose is the one we're discussing here -- the ability for people to make a pointer to a specific page... and -- as i have said -- the reason we need to facilitate that is because our culture heritage is full of page-based pointers. and again, we _could_ point them to a place with just the text, but everyone knows that text can be easily "edited", so we also put the original page-scan up so as to increase the trust factor. (of course, scans could _also_ be doctored, but at some point, there's only so much you can do.) > Couldn't you point to the page content using > http://z-m-l.com/go/myant/myantp.html#189 > as opposed to > http://z-m-l.com/go/myant/myantp189.html. sure. and sometimes that's what you'll want to do instead. but let me show you something. stopwatch this link: > http://z-m-l.com/go/myant/myantp189.html now check the length of time it takes to go to this one: > http://www.openreader.org/myantonia/...a.html#page189 unless that second page was already in your cache or you have a superfast connection, it took _lots_ longer to load, because you're loading in some 500k of text -- the whole book -- instead of 1k of text and a scan. (for the dialup users, the second file will be _painful_.) so it depends on what you need your readers to load... if you only need them to load one page of text, do that. if you need them to load the whole book, then do _that_. you'll notice that the second link doesn't include the scans in-line in the file; you have to click a link to view each one. (the scans run to 30 megs, so it'd be suicide to load 'em all.) so it depends on what you need. if you wanted to point to one page in each of 50 books, you wouldn't want to force your reader to load each of the 50 books in full just to see that one page. and this is often the essence of a scholarly reference section. so it depends. this is why we need the flexibility to quickly and easily auto-generate whatever format is needed at the time... > That way the whole content of the book is in one file > and conversion to other formats would be easier. in sum, i pointed to a page-based form because of this discussion... i can also create book-based forms when _that_ is more appropriate. (such flexibility is one reason i invented my z.m.l. format, which is a sidetrack topic in that other thread from which this one came...) > For example how do you recognize when a paragraph splits across > two pages and how do you join them back together when converting? good question. but easy answer. in a "master" file which has pagebreaks marked, like this one: > http://z-m-l.com/go/myant/myant.zml the formula for generating a version _without_ the pagebreak info is to: 1. delete the _one_ blank line _above_ the [[doublebracketed]] pagenumber, and delete the _one_ blank line _below_ the {{doublebraced}} scan-filename... 2. if there were _two_ blank lines above and below, respectively, then that was a paragraph break, so you should insert a blank line in the output file. if you follow that rule, you'll find that paragraphs which cross pagebreaks get joined together, while the ones that ended on the pagebreak still do... for instance, in the .zml master, compare the breaks between these pages: > http://z-m-l.com/go/myant/myantp040.html > http://z-m-l.com/go/myant/myantp041.html versus: > http://z-m-l.com/go/myant/myantp061.html > http://z-m-l.com/go/myant/myantp062.html see how easy it was for me to point you to those pages specifically? and also the _usefulness_ of being able to see both text _and_ scan? -bowerbird

11-07-2007, 05:18 PM	#71
sartori Connoisseur Posts: 54 Karma: 29 Join Date: Oct 2006	Bowerbird, Your reasoning makes sense to me (In response to my question). So some more questions if you don't mind When you receive an error notification do you just update the master file then regenerate the paged version? or vice-versa? or for small updates do you just make the change in both versions? On page 61 (http://z-m-l.com/go/myant/myantp061.html) I noticed that a few words are hyphenated across lines. On your master view the words are correctly joined (tea-kettle & followed). Were these manually corrected or automated? If automated did it correctly catch tea-kettle should keep its' hyphen? I'm not sure if z.m.l. is the way I want to go with my formatting but I'm still at the early stages of formatting so I'm just checking out options (googling for ebook markup languages is hopeless as you just get a ton of responses that are actual ebooks). Thanks, rob

11-07-2007, 05:57 PM	#72
bowerbird Banned Posts: 269 Karma: -273 Join Date: Sep 2006 Location: los angeles	sartori said: > So some more questions if you don't mind i don't mind a bit. that's why i'm here, to discuss... > When you receive an error notification do you just > update the master file then regenerate the paged version? > or vice-versa? or for small updates > do you just make the change in both versions? if you go to the directory now, you'll see a bunch of files: > http://z-m-l.com/go/myant/ including all of the .html files to which i've been linking... the .html files were generated in a batch from the master. but eventually, all the separate .html files will disappear. they'll be replaced by a script which intercepts links like this: > http://z-m-l.com/go/myant/myantp061.html and creates that .html file on-the-fly... so yes, any correction will be made to the master, after which the script will include it when it builds the .html file next time. > On page 61 (http://z-m-l.com/go/myant/myantp061.html) > I noticed that a few words are hyphenated across lines. > On your master view the words are correctly joined > (tea-kettle & followed). um, as far as i can tell, you're mistaken. here's the master: > http://z-m-l.com/go/myant/myant.zml what i see there, in the master, is this: > Peter shuffled to his feet, caught up the tea- > kettle and mixed him some hot water and > whiskey. The sharp smell of spirits went > through the room. > > Pavel snatched the cup and drank, then > made Peter give him the bottle and slipped > it under his pillow, grinning disagreeably, > as if he had outwitted some one. His eyes fol- > lowed Peter about the room with a contempt- > uous, unfriendly expression. It seemed to > me that he despised him for being so simple > and docile. do you really see something different? if so, that's a mystery... > Were these manually corrected or automated? > If automated did it correctly catch tea-kettle should keep its' hyphen? not all of the example-files that i have up are correct on this point yet, but they'll be marked as to whether an end-line hyphen is kept or not... so, if "tea-kettle" -- with the dash -- is the form used in this book (when the word is mid-sentence), then the master will look like this: > Peter shuffled to his feet, caught up the tea-@ > kettle and mixed him some hot water and (i haven't decided if we'll use the at-sign, but you get the idea.) on the other hand, if this book uses "teakettle", the master will say: > Peter shuffled to his feet, caught up the tea- > kettle and mixed him some hot water and (for the record, this book does indeed use "tea-kettle" in the one other instance where the word occurs. in the cases where there is no other use of an end-line hyphenate, we consult the dictionary. when there is inconsistency within a book, we edit to consistency.) > I'm not sure if z.m.l. is the way I want to go with my formatting but > I'm still at the early stages of formatting so I'm just checking out options i definitely suggest light-markup. "markdown" is the current favorite, if you want broad support. my tool-change is approaching coherence, so you could get the job done, but markdown gives you more reliability. google "showdown" and "markdown" for an interesting real-time demo: > http://www.attacklab.net/showdown-gui.html -bowerbird

11-07-2007, 06:09 PM	#73
sartori Connoisseur Posts: 54 Karma: 29 Join Date: Oct 2006	Bowerbird - sorry didn't mean the master I meant the html view that you listed. I like the showdown stuff - seems a little limited as far as layout but it looks really easy to use. Thanks.

11-07-2007, 06:29 PM	#74
bowerbird Banned Posts: 269 Karma: -273 Join Date: Sep 2006 Location: los angeles	sartori said: > I like the showdown stuff - seems a little limited as far as layout depends on what you want to do with a book, and what platforms you want to put it out to... you can often exercise tight control in _one_ setting, but then it blows up on you when you try to move it... a good rule of thumb is that if you cannot do it with light-markup, then you shouldn't be doing it anyway, because it's not gonna convert well to other settings. so living with some "limitations" from the beginning can save you a truckload of heartburn done the road. but, you know, your demo showed you've got chops... so i'd encourage you to let your mind experiment fully. -bowerbird

11-07-2007, 06:32 PM	#75
bowerbird Banned Posts: 269 Karma: -273 Join Date: Sep 2006 Location: los angeles	sartori sadi: > sorry didn't mean the master I meant the html view that you listed. except i still don't follow. the individual-page .html file shows end-line hyphenates just like the scan: > http://z-m-l.com/go/myant/myantp061.html -bowerbird

Advert

Advert