MobileRead Forums - View Single Post - Page numbers in ebooks for scholarly research?

bowerbird · 11-08-2007, 02:48 PM

panurge said:
> Yes, I think that "resolution" is the problem.
> Paragraph numbers would probably work well
> for everything but poetry, though in some cases
> --such as the one you mention--larger units might be
> more practical.

ok, i'll try one more time.

_nothing_ will work if you don't have stable documents.
_nothing_. so stable documents is a necessary condition.

fortunately, stable documents is also a _sufficient_ condition.
once you have stable documents, just about _any_ system will
work, and work just fine, so you don't need to worry about it...

> Page numbers work if one can pinpoint the exact edition
> (publisher, place, date, in addition to title and author)
> being referenced; that was the contribution of printing.

assuming that you have an infrastructure of stable documents,
the u.r.l. to a document is the "pinpoint" to "the exact edition."

every document points to its "official" u.r.l., so you can compare it
with the document that appears at that u.r.l., and if it is the same,
it hasn't been tampered with, and you know it's a "legitimate" copy.

as with everything else, a system with stable documents makes it
_easy_, whereas it's difficult -- often to the point of impossibility --
in a system without stable documents.

> For manuscript copies, logical divisions such as sections or
> paragraphs or line numbers (for verse) were the only alternative.

and we need to "update" all those archival pointers for the new system.
whatever pointer that one document used to point to another document
needs to be "converted" so the electronic version of the first document
points to the correct place in the electronic version of the second one...

> But are such things needed for electronic documents that
> can be searched for exact phrases? Presumably not.

unless your infrastructure is explicitly using "search" as its methodology,
in which case it's automatic, you don't want to force users to do search
just to activate a pointer. they'll wanna be able to click directly to a point,
and that's a reasonable expectation about a capability we should give 'em.

> So long as one can identify the electronic source one is referring to,
> searching would suffice.

again, the source is unequivocally identified by virtue of its u.r.l.
and even if searching would "suffice", it's not convenient enough.

> But there's the rub. There is no system of cataloguing material
> that is purely electronic in origin. The URL of a web site, for instance,
> is an unstable identifier, as we have learnt very quickly in the last decade

the current system, one which permits unstable documents, won't work.

we need another system -- it could be built on top of the current one --
that has _only_ stable documents in it. this means we can still have the
unstable system -- there's no need to replace it, as it works fine for a
good many purposes -- it just means we have to create another system
that's fully intended to be a permanent archive for dependable reference.

as i said, this stable system could even be built on top of the current one.
if we incorporated a "datestamp" into the u.r.l., and then made sure that
we archived _everything_ that was _ever_ put on the web (which is not
as absurd as it sounds, since we're _almost_ doing it already), then we
will essentially _have_ the stable infrastructure that's required, at no cost.
(the wayback machine at internet archive is the best example of this now.)

> Printed books have that data, but what kind of unique identifier
> do electronic documents offer?

none. until, that is, we give them one. which isn't difficult to do at all...

> There's no central clearing house, no Library of Congress or OCLC
> (the online cataloguing authority for books) or ISBN number as of yet.

don't need that. wouldn't want that. this is an easy problem to solve.
it just requires always-getting-cheaper diskspace, and the commitment.

> Speaking of Google books (which BowerBird mentions above),
> shouldn't someone point out to them that the scanning is being
> rather carelessly executed?

oh, it's been pointed out. over and over and over and over and over.
even by some of its big supporters, like me. repeatedly. problem is,
it just doesn't seem to be sinking in, not quite as deeply as it should.
(they _have_ improved. but quality, and quality-control, is still awful.)

-bowerbird

11-08-2007, 02:48 PM	#81
bowerbird Banned Posts: 269 Karma: -273 Join Date: Sep 2006 Location: los angeles	panurge said: > Yes, I think that "resolution" is the problem. > Paragraph numbers would probably work well > for everything but poetry, though in some cases > --such as the one you mention--larger units might be > more practical. ok, i'll try one more time. _nothing_ will work if you don't have stable documents. _nothing_. so stable documents is a necessary condition. fortunately, stable documents is also a _sufficient_ condition. once you have stable documents, just about _any_ system will work, and work just fine, so you don't need to worry about it... > Page numbers work if one can pinpoint the exact edition > (publisher, place, date, in addition to title and author) > being referenced; that was the contribution of printing. assuming that you have an infrastructure of stable documents, the u.r.l. to a document is the "pinpoint" to "the exact edition." every document points to its "official" u.r.l., so you can compare it with the document that appears at that u.r.l., and if it is the same, it hasn't been tampered with, and you know it's a "legitimate" copy. as with everything else, a system with stable documents makes it _easy_, whereas it's difficult -- often to the point of impossibility -- in a system without stable documents. > For manuscript copies, logical divisions such as sections or > paragraphs or line numbers (for verse) were the only alternative. and we need to "update" all those archival pointers for the new system. whatever pointer that one document used to point to another document needs to be "converted" so the electronic version of the first document points to the correct place in the electronic version of the second one... > But are such things needed for electronic documents that > can be searched for exact phrases? Presumably not. unless your infrastructure is explicitly using "search" as its methodology, in which case it's automatic, you don't want to force users to do search just to activate a pointer. they'll wanna be able to click directly to a point, and that's a reasonable expectation about a capability we should give 'em. > So long as one can identify the electronic source one is referring to, > searching would suffice. again, the source is unequivocally identified by virtue of its u.r.l. and even if searching would "suffice", it's not convenient enough. > But there's the rub. There is no system of cataloguing material > that is purely electronic in origin. The URL of a web site, for instance, > is an unstable identifier, as we have learnt very quickly in the last decade the current system, one which permits unstable documents, won't work. we need another system -- it could be built on top of the current one -- that has _only_ stable documents in it. this means we can still have the unstable system -- there's no need to replace it, as it works fine for a good many purposes -- it just means we have to create another system that's fully intended to be a permanent archive for dependable reference. as i said, this stable system could even be built on top of the current one. if we incorporated a "datestamp" into the u.r.l., and then made sure that we archived _everything_ that was _ever_ put on the web (which is not as absurd as it sounds, since we're _almost_ doing it already), then we will essentially _have_ the stable infrastructure that's required, at no cost. (the wayback machine at internet archive is the best example of this now.) > Printed books have that data, but what kind of unique identifier > do electronic documents offer? none. until, that is, we give them one. which isn't difficult to do at all... > There's no central clearing house, no Library of Congress or OCLC > (the online cataloguing authority for books) or ISBN number as of yet. don't need that. wouldn't want that. this is an easy problem to solve. it just requires always-getting-cheaper diskspace, and the commitment. > Speaking of Google books (which BowerBird mentions above), > shouldn't someone point out to them that the scanning is being > rather carelessly executed? oh, it's been pointed out. over and over and over and over and over. even by some of its big supporters, like me. repeatedly. problem is, it just doesn't seem to be sinking in, not quite as deeply as it should. (they _have_ improved. but quality, and quality-control, is still awful.) -bowerbird