Page numbers in ebooks for scholarly research? - Page 6

Panurge · 11-08-2007, 12:09 AM

Kovidgoyal: [EDIT: An example from physics research articles. A resolution of sections is usually sufficient. i.e. people refer to section so-and-so of paper so and so.
I don't know if that is sufficient resolution in general though.]

Yes, I think that "resolution" is the problem. Paragraph numbers would probably work well for everything but poetry, though in some cases--such as the one you mention--larger units might be more practical. Page numbers work if one can pinpoint the exact edition (publisher, place, date, in addition to title and author) being referenced; that was the contribution of printing. For manuscript copies, logical divisions such as sections or paragraphs or line numbers (for verse) were the only alternative. But are such things needed for electronic documents that can be searched for exact phrases? Presumably not. So long as one can identify the electronic source one is referring to, searching would suffice. But there's the rub. There is no system of cataloguing material that is purely electronic in origin. The URL of a web site, for instance, is an unstable identifier, as we have learnt very quickly in the last decade or so. Printed books have that data, but what kind of unique identifier do electronic documents offer? There's no central clearing house, no Library of Congress or OCLC (the online cataloguing authority for books) or ISBN number as of yet.
When Michael Hart (an academic) started Project Gutenberg, he seems to have encouraged embedded page numbers in ASCII text for the reasons we've already discussed. So far electronic documents are a sort of free-floating, indistinct mass of various kinds of information. Without some standards of granularity or resolution, research will become too unwieldy; the Internet search engine demonstrates the problem all too well.
As a librarian (rather than as a programmer, who finds useful and efficient ways of designing specific solutions), I have to worry about this sort of thing increasingly. Page numbers are, in a manner of speaking, the tip of the iceberg.
Speaking of Google books (which BowerBird mentions above), shouldn't someone point out to them that the scanning is being rather carelessly executed? I keep running into instances of books that are so poorly positioned that part of the text is cut off, to say nothing of the page numbers.
--------------------------------------------------------------------------------

bowerbird · 11-08-2007, 05:17 AM

sartori said:
> those pages I added were time consuming
> but mainly because I was figuring out the layout.
> I do plan on working through the whole book
> but I haven't found a plain text version available
> so I am ocr'ing the pdf from archive.org.
> This is currently the slowest part as I am
> proofing and converting quotes and dashes over.

um, gee, you might be missing something very important.

if you got it from archive.org, then it was almost certainly
scanned by the o.c.a., which means that -- right alongside
the .pdf copy -- you should find the o.c.r. they did on it...

i couldn't find volume 1, but some volumes from this series
certainly have their text available. sometimes you need to
click on the "ftp" link to find _all_ of the files that they offer.
if you see nothing labeled as ".txt", seek the "djvu.text" file.

however, in a spectacular display of sheer incompetence,
sometimes the text files are burdened by severe problems,
some of which can even border on fatal. i won't bother to
go into the details here, but check the text _carefully_ first,
before going on to pour work into it, or you might regret it.

so you might well end up doing o.c.r. on the .pdf anyway.
but i'd still suggest you should check out their text first...

> For example, if you increase the display font size
> in your browser, the pages expand lengthwise
> to accommodate it. It just runs into problems with items
> that are specifically positioned, such as the table of contents.

another problem that you need to be aware of -- which might
or might not be something you consider serious -- is when a
paragraph is split across a pagebreak -- as they usually are --
because then the text won't fill out the bottom line of the page,
which is what people expect to see in that situation. the reflow
(will often) end in the middle of the line, the impression is that
the paragraph has ended, which can be disconcerting to people.

> If so it wouldn't be too hard to created a library of books
> that display paged as in my example but then you could
> easily convert them to lrf and ignore page numbers, etc.

but if the page as displayed doesn't fit correctly on the screen,
then you'll have "pagebreaks" occurring mid-screen, correct?
which kind of defeats the whole purpose of a paged display...

***

jbenny said:
> They have apparently OCRed the text

um, well of course google does o.c.r. on the scans.
how else would they be able to do searches on it?

> as you can "view text" for each individual page.

they do that so as to provide access to the visually-impaired.

> Sadly, the downloadable PDF doesn't include the OCRed text.

that's because they don't really want you to have the text.
well, they probably don't care if _you_ have it, but they
don't want all of the _other_ search engines to have it...

***

sartori said:
> I just checked those out and they appear to be from
> a slightly different version than the ones on archive.org
> (and they have all 31 volumes). As my goal is to represent
> the printed version, the differences may become a problem
> with page numbers being different.

so strange. did this series with 31 volumes _really_ go through
several editions? i guess it's not impossible, but it'd suprise me.
are you sure that it's not just _flakiness_ in the p.g. digitization?

because one appealing aspect of the p.g. versions in general is
that they've been subjected to some proofreading, which means
-- if nothing less -- that you can compare them to your output,
because the differences between the two versions will point to
errors in one (or both) of them. indeed, this has provem to be
one of the _most_ effective ways of doing "proofing" on a text...

-bowerbird

DaleDe · 11-08-2007, 01:35 PM

Quote:

Originally Posted by Panurge

Kovidgoyal:
Yes, I think that "resolution" is the problem. Paragraph numbers would probably work well for everything but poetry, though in some cases--such as the one you mention--larger units might be more practical. Page numbers work if one can pinpoint the exact edition (publisher, place, date, in addition to title and author) being referenced; that was the contribution of printing. For manuscript copies, logical divisions such as sections or paragraphs or line numbers (for verse) were the only alternative. But are such things needed for electronic documents that can be searched for exact phrases? Presumably not. So long as one can identify the electronic source one is referring to, searching would suffice. But there's the rub. There is no system of cataloguing material that is purely electronic in origin. The URL of a web site, for instance, is an unstable identifier, as we have learnt very quickly in the last decade or so. Printed books have that data, but what kind of unique identifier do electronic documents offer? There's no central clearing house, no Library of Congress or OCLC (the online cataloguing authority for books) or ISBN number as of yet.

I think paragraphs work for everything. After all a Stanza of poetry is really a paragraph in effect. The idea is that the logical unit of the author is the paragraph. It is a cohesive thought and can often be determined easily even when the editions change. Paperback vs. hardback makes page number useless again.

you are correct that text can be searched but sometimes there are duplicates if you fail to type enough and searches sometimes fail on word wraps. A specific reference is what is needed when referencing someone else's work.

You also raise a good point about electronic text not being able to be identified. This is particularly interesting when web pages are inherently copyrighted. It would seem that copyrights are not enforceable if you can't produce the original.

Dale

nekokami · 11-08-2007, 01:39 PM

Quote:

Originally Posted by DaleDe

You also raise a good point about electronic text not being able to be identified. This is particularly interesting when web pages are inherently copyrighted. It would seem that copyrights are not enforceable if you can't produce the original.

What about the Internet Archive?

DaleDe · 11-08-2007, 01:46 PM

Quote:

Originally Posted by nekokami

What about the Internet Archive?

Are you talking about google? I think it is not guaranteed over the long haul.

Dale

bowerbird · 11-08-2007, 02:48 PM

panurge said:
> Yes, I think that "resolution" is the problem.
> Paragraph numbers would probably work well
> for everything but poetry, though in some cases
> --such as the one you mention--larger units might be
> more practical.

ok, i'll try one more time.

_nothing_ will work if you don't have stable documents.
_nothing_. so stable documents is a necessary condition.

fortunately, stable documents is also a _sufficient_ condition.
once you have stable documents, just about _any_ system will
work, and work just fine, so you don't need to worry about it...

> Page numbers work if one can pinpoint the exact edition
> (publisher, place, date, in addition to title and author)
> being referenced; that was the contribution of printing.

assuming that you have an infrastructure of stable documents,
the u.r.l. to a document is the "pinpoint" to "the exact edition."

every document points to its "official" u.r.l., so you can compare it
with the document that appears at that u.r.l., and if it is the same,
it hasn't been tampered with, and you know it's a "legitimate" copy.

as with everything else, a system with stable documents makes it
_easy_, whereas it's difficult -- often to the point of impossibility --
in a system without stable documents.

> For manuscript copies, logical divisions such as sections or
> paragraphs or line numbers (for verse) were the only alternative.

and we need to "update" all those archival pointers for the new system.
whatever pointer that one document used to point to another document
needs to be "converted" so the electronic version of the first document
points to the correct place in the electronic version of the second one...

> But are such things needed for electronic documents that
> can be searched for exact phrases? Presumably not.

unless your infrastructure is explicitly using "search" as its methodology,
in which case it's automatic, you don't want to force users to do search
just to activate a pointer. they'll wanna be able to click directly to a point,
and that's a reasonable expectation about a capability we should give 'em.

> So long as one can identify the electronic source one is referring to,
> searching would suffice.

again, the source is unequivocally identified by virtue of its u.r.l.
and even if searching would "suffice", it's not convenient enough.

> But there's the rub. There is no system of cataloguing material
> that is purely electronic in origin. The URL of a web site, for instance,
> is an unstable identifier, as we have learnt very quickly in the last decade

the current system, one which permits unstable documents, won't work.

we need another system -- it could be built on top of the current one --
that has _only_ stable documents in it. this means we can still have the
unstable system -- there's no need to replace it, as it works fine for a
good many purposes -- it just means we have to create another system
that's fully intended to be a permanent archive for dependable reference.

as i said, this stable system could even be built on top of the current one.
if we incorporated a "datestamp" into the u.r.l., and then made sure that
we archived _everything_ that was _ever_ put on the web (which is not
as absurd as it sounds, since we're _almost_ doing it already), then we
will essentially _have_ the stable infrastructure that's required, at no cost.
(the wayback machine at internet archive is the best example of this now.)

> Printed books have that data, but what kind of unique identifier
> do electronic documents offer?

none. until, that is, we give them one. which isn't difficult to do at all...

> There's no central clearing house, no Library of Congress or OCLC
> (the online cataloguing authority for books) or ISBN number as of yet.

don't need that. wouldn't want that. this is an easy problem to solve.
it just requires always-getting-cheaper diskspace, and the commitment.

> Speaking of Google books (which BowerBird mentions above),
> shouldn't someone point out to them that the scanning is being
> rather carelessly executed?

oh, it's been pointed out. over and over and over and over and over.
even by some of its big supporters, like me. repeatedly. problem is,
it just doesn't seem to be sinking in, not quite as deeply as it should.
(they _have_ improved. but quality, and quality-control, is still awful.)

-bowerbird

bowerbird · 11-08-2007, 03:00 PM

dalede said:
> I think paragraphs work for everything.

well, almost _anything_ will "work for everything"
if everyone agrees on how it will be implemented.

> After all a Stanza of poetry is really a paragraph in effect.

i know some people who would argue with you about that.
for a long time. they'd call you bad names for saying that.

> The idea is that the logical unit of the author is the paragraph.

maybe in your mind. but other authors could be very different.

> It is a cohesive thought and can often be determined
> easily even when the editions change.

i can show you edition-changes with changed paragraphs.

(but that's really neither here nor there, because any system
has to consider different editions to be different documents.
every pointer has to be relative to a specific edition, or else
you start getting into all kinds of very confusing messiness.)

> Paperback vs. hardback makes page number useless again.

not really. even if the pagination is different between the two
-- and sometimes it's not, but that's beside the point here --
when you're making a link, you simply link to one or the other...

> you are correct that text can be searched but sometimes
> there are duplicates if you fail to type enough and
> searches sometimes fail on word wraps. A specific reference
> is what is needed when referencing someone else's work.

there are some people who say that, because this issue is so
thorny right now, in our world of unstable documents, that
any text that you want to quote should just be included in
your own document. it's easy enough with copy-and-paste.

("what if you want to cite a whole article or book?", you ask.
then a system based on _search_ won't work for you anyway,
which is part of the problem with specifying such a system...)

-bowerbird

nekokami · 11-08-2007, 05:49 PM

Quote:

Originally Posted by DaleDe

Are you talking about google? I think it is not guaranteed over the long haul.

Dale

No: http://www.archive.org

DaleDe · 11-08-2007, 06:12 PM

Quote:

Originally Posted by nekokami

No: http://www.archive.org

Interesting. I didn't know that it existed. I searched it for my name and got zero hits but if I search google I get almost 10,000 hits so I think their search engine isn't too good. Thanks for the information.

Dale

nekokami · 11-08-2007, 08:29 PM

I think it works best if you have an actual website to search for. I've managed to use it to dig up all kinds of pages that have disappeared over the years.

DaleDe · 11-08-2007, 09:09 PM

Quote:

Originally Posted by nekokami

I think it works best if you have an actual website to search for. I've managed to use it to dig up all kinds of pages that have disappeared over the years.

Thanks, that works for me. I can see this is valuable. Looks like my site has about 8 years of history stored there.

Dale

Panurge · 11-09-2007, 12:03 AM

DaleDe: "I think paragraphs work for everything. After all a Stanza of poetry is really a paragraph in effect. The idea is that the logical unit of the author is the paragraph. It is a cohesive thought and can often be determined easily even when the editions change. Paperback vs. hardback makes page number useless again."

Unfortunately, much poetry is not in stanzas, especially when it is written in blank verse.

kovidgoyal · 11-09-2007, 12:10 AM

Well there's no real reason why you cant have paragraphs that are a single line long for blank verse.

Panurge · 11-09-2007, 12:14 AM

BowerBird: 'it is _not_ our job to make "a faithful representation of the print copy". we don't even _want_ to do that -- even if we could -- and we _cannot_, because any time you move a document from one medium to a completely different one, you're creating a new edition.'

I don't think a representation of the print copy in its visual layout is necessary, just an exact transcription of the content. I want to know that no words (or other linguistic elements, such as punctuation and paragraphing, for instance) have been added or substracted without some sort of indication that something was changed from the original source. That has been solid scholarly practice from the beginning, which is not the same as a facsimile. If the electronic version is guaranteed identical in that sense, then I would rarely need page numbers.

By the way, I find BowerBird's examples very interesting.

sartori · 11-09-2007, 02:10 AM

Quote:

Originally Posted by Panurge

I don't think a representation of the print copy in its visual layout is necessary, just an exact transcription of the content.

Panurge,

While I agree with the above statement when applied for research purposes. I feel that a decent representation of the source layout can add to the enjoyment when reading for pleasure. The attached image from Alice In Wonderland is one example where the layout of the text adds something to the book. I think if you start out with a solid facsimile of the original it's pretty easy to convert that to plain text or any other format for research purposes.

Of course, I agree with you that laying this out can be done pretty easily for a web page but as soon as you try to convert it for different devices like the sony reader it's pretty much a lost cause.

Ultimately I would love it if you could start with a master document that is a facsimile of the original print version for viewing online and then export it for individual devices and have the server 'automatically' remove any markup that is not supported for that device.

Just my 2 cents.

Rob

11-09-2007, 12:14 AM	#89
Panurge Enthusiast Posts: 34 Karma: 336 Join Date: Dec 2006 Location: Texas Device: Sony Reader	BowerBird: 'it is _not_ our job to make "a faithful representation of the print copy". we don't even _want_ to do that -- even if we could -- and we _cannot_, because any time you move a document from one medium to a completely different one, you're creating a new edition.' I don't think a representation of the print copy in its visual layout is necessary, just an exact transcription of the content. I want to know that no words (or other linguistic elements, such as punctuation and paragraphing, for instance) have been added or substracted without some sort of indication that something was changed from the original source. That has been solid scholarly practice from the beginning, which is not the same as a facsimile. If the electronic version is guaranteed identical in that sense, then I would rarely need page numbers. By the way, I find BowerBird's examples very interesting. Last edited by Panurge; 11-09-2007 at 12:17 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Page numbers	Fincary	Astak EZReader	4	02-18-2010 04:06 PM
page numbers	nenad	Amazon Kindle	2	12-19-2009 10:01 AM
Professional and scholarly ebooks account for 75% of ebook market?	anurag	News	1	11-26-2009 01:40 PM
Page numbers, AGAIN	orlincho	Bookeen	92	08-19-2008 08:15 AM
Page numbers (again)	Prospect	Workshop	50	04-10-2008 03:19 AM

11-08-2007, 12:09 AM	#76
Panurge Enthusiast Posts: 34 Karma: 336 Join Date: Dec 2006 Location: Texas Device: Sony Reader	Kovidgoyal: [EDIT: An example from physics research articles. A resolution of sections is usually sufficient. i.e. people refer to section so-and-so of paper so and so. I don't know if that is sufficient resolution in general though.] Yes, I think that "resolution" is the problem. Paragraph numbers would probably work well for everything but poetry, though in some cases--such as the one you mention--larger units might be more practical. Page numbers work if one can pinpoint the exact edition (publisher, place, date, in addition to title and author) being referenced; that was the contribution of printing. For manuscript copies, logical divisions such as sections or paragraphs or line numbers (for verse) were the only alternative. But are such things needed for electronic documents that can be searched for exact phrases? Presumably not. So long as one can identify the electronic source one is referring to, searching would suffice. But there's the rub. There is no system of cataloguing material that is purely electronic in origin. The URL of a web site, for instance, is an unstable identifier, as we have learnt very quickly in the last decade or so. Printed books have that data, but what kind of unique identifier do electronic documents offer? There's no central clearing house, no Library of Congress or OCLC (the online cataloguing authority for books) or ISBN number as of yet. When Michael Hart (an academic) started Project Gutenberg, he seems to have encouraged embedded page numbers in ASCII text for the reasons we've already discussed. So far electronic documents are a sort of free-floating, indistinct mass of various kinds of information. Without some standards of granularity or resolution, research will become too unwieldy; the Internet search engine demonstrates the problem all too well. As a librarian (rather than as a programmer, who finds useful and efficient ways of designing specific solutions), I have to worry about this sort of thing increasingly. Page numbers are, in a manner of speaking, the tip of the iceberg. Speaking of Google books (which BowerBird mentions above), shouldn't someone point out to them that the scanning is being rather carelessly executed? I keep running into instances of books that are so poorly positioned that part of the text is cut off, to say nothing of the page numbers. --------------------------------------------------------------------------------

11-08-2007, 05:17 AM	#77
bowerbird Banned Posts: 269 Karma: -273 Join Date: Sep 2006 Location: los angeles	sartori said: > those pages I added were time consuming > but mainly because I was figuring out the layout. > I do plan on working through the whole book > but I haven't found a plain text version available > so I am ocr'ing the pdf from archive.org. > This is currently the slowest part as I am > proofing and converting quotes and dashes over. um, gee, you might be missing something very important. if you got it from archive.org, then it was almost certainly scanned by the o.c.a., which means that -- right alongside the .pdf copy -- you should find the o.c.r. they did on it... i couldn't find volume 1, but some volumes from this series certainly have their text available. sometimes you need to click on the "ftp" link to find _all_ of the files that they offer. if you see nothing labeled as ".txt", seek the "djvu.text" file. however, in a spectacular display of sheer incompetence, sometimes the text files are burdened by severe problems, some of which can even border on fatal. i won't bother to go into the details here, but check the text _carefully_ first, before going on to pour work into it, or you might regret it. so you might well end up doing o.c.r. on the .pdf anyway. but i'd still suggest you should check out their text first... > For example, if you increase the display font size > in your browser, the pages expand lengthwise > to accommodate it. It just runs into problems with items > that are specifically positioned, such as the table of contents. another problem that you need to be aware of -- which might or might not be something you consider serious -- is when a paragraph is split across a pagebreak -- as they usually are -- because then the text won't fill out the bottom line of the page, which is what people expect to see in that situation. the reflow (will often) end in the middle of the line, the impression is that the paragraph has ended, which can be disconcerting to people. > If so it wouldn't be too hard to created a library of books > that display paged as in my example but then you could > easily convert them to lrf and ignore page numbers, etc. but if the page as displayed doesn't fit correctly on the screen, then you'll have "pagebreaks" occurring mid-screen, correct? which kind of defeats the whole purpose of a paged display... * jbenny said: > They have apparently OCRed the text um, well of course google does o.c.r. on the scans. how else would they be able to do searches on it? > as you can "view text" for each individual page. they do that so as to provide access to the visually-impaired. > Sadly, the downloadable PDF doesn't include the OCRed text. that's because they don't really want you to have the text. well, they probably don't care if _you_ have it, but they don't want all of the _other_ search engines to have it... * sartori said: > I just checked those out and they appear to be from > a slightly different version than the ones on archive.org > (and they have all 31 volumes). As my goal is to represent > the printed version, the differences may become a problem > with page numbers being different. so strange. did this series with 31 volumes _really_ go through several editions? i guess it's not impossible, but it'd suprise me. are you sure that it's not just _flakiness_ in the p.g. digitization? because one appealing aspect of the p.g. versions in general is that they've been subjected to some proofreading, which means -- if nothing less -- that you can compare them to your output, because the differences between the two versions will point to errors in one (or both) of them. indeed, this has provem to be one of the _most_ effective ways of doing "proofing" on a text... -bowerbird

11-08-2007, 02:48 PM	#81
bowerbird Banned Posts: 269 Karma: -273 Join Date: Sep 2006 Location: los angeles	panurge said: > Yes, I think that "resolution" is the problem. > Paragraph numbers would probably work well > for everything but poetry, though in some cases > --such as the one you mention--larger units might be > more practical. ok, i'll try one more time. _nothing_ will work if you don't have stable documents. _nothing_. so stable documents is a necessary condition. fortunately, stable documents is also a _sufficient_ condition. once you have stable documents, just about _any_ system will work, and work just fine, so you don't need to worry about it... > Page numbers work if one can pinpoint the exact edition > (publisher, place, date, in addition to title and author) > being referenced; that was the contribution of printing. assuming that you have an infrastructure of stable documents, the u.r.l. to a document is the "pinpoint" to "the exact edition." every document points to its "official" u.r.l., so you can compare it with the document that appears at that u.r.l., and if it is the same, it hasn't been tampered with, and you know it's a "legitimate" copy. as with everything else, a system with stable documents makes it _easy_, whereas it's difficult -- often to the point of impossibility -- in a system without stable documents. > For manuscript copies, logical divisions such as sections or > paragraphs or line numbers (for verse) were the only alternative. and we need to "update" all those archival pointers for the new system. whatever pointer that one document used to point to another document needs to be "converted" so the electronic version of the first document points to the correct place in the electronic version of the second one... > But are such things needed for electronic documents that > can be searched for exact phrases? Presumably not. unless your infrastructure is explicitly using "search" as its methodology, in which case it's automatic, you don't want to force users to do search just to activate a pointer. they'll wanna be able to click directly to a point, and that's a reasonable expectation about a capability we should give 'em. > So long as one can identify the electronic source one is referring to, > searching would suffice. again, the source is unequivocally identified by virtue of its u.r.l. and even if searching would "suffice", it's not convenient enough. > But there's the rub. There is no system of cataloguing material > that is purely electronic in origin. The URL of a web site, for instance, > is an unstable identifier, as we have learnt very quickly in the last decade the current system, one which permits unstable documents, won't work. we need another system -- it could be built on top of the current one -- that has _only_ stable documents in it. this means we can still have the unstable system -- there's no need to replace it, as it works fine for a good many purposes -- it just means we have to create another system that's fully intended to be a permanent archive for dependable reference. as i said, this stable system could even be built on top of the current one. if we incorporated a "datestamp" into the u.r.l., and then made sure that we archived _everything_ that was _ever_ put on the web (which is not as absurd as it sounds, since we're _almost_ doing it already), then we will essentially _have_ the stable infrastructure that's required, at no cost. (the wayback machine at internet archive is the best example of this now.) > Printed books have that data, but what kind of unique identifier > do electronic documents offer? none. until, that is, we give them one. which isn't difficult to do at all... > There's no central clearing house, no Library of Congress or OCLC > (the online cataloguing authority for books) or ISBN number as of yet. don't need that. wouldn't want that. this is an easy problem to solve. it just requires always-getting-cheaper diskspace, and the commitment. > Speaking of Google books (which BowerBird mentions above), > shouldn't someone point out to them that the scanning is being > rather carelessly executed? oh, it's been pointed out. over and over and over and over and over. even by some of its big supporters, like me. repeatedly. problem is, it just doesn't seem to be sinking in, not quite as deeply as it should. (they _have_ improved. but quality, and quality-control, is still awful.) -bowerbird

11-08-2007, 03:00 PM	#82
bowerbird Banned Posts: 269 Karma: -273 Join Date: Sep 2006 Location: los angeles	dalede said: > I think paragraphs work for everything. well, almost _anything_ will "work for everything" if everyone agrees on how it will be implemented. > After all a Stanza of poetry is really a paragraph in effect. i know some people who would argue with you about that. for a long time. they'd call you bad names for saying that. > The idea is that the logical unit of the author is the paragraph. maybe in your mind. but other authors could be very different. > It is a cohesive thought and can often be determined > easily even when the editions change. i can show you edition-changes with changed paragraphs. (but that's really neither here nor there, because any system has to consider different editions to be different documents. every pointer has to be relative to a specific edition, or else you start getting into all kinds of very confusing messiness.) > Paperback vs. hardback makes page number useless again. not really. even if the pagination is different between the two -- and sometimes it's not, but that's beside the point here -- when you're making a link, you simply link to one or the other... > you are correct that text can be searched but sometimes > there are duplicates if you fail to type enough and > searches sometimes fail on word wraps. A specific reference > is what is needed when referencing someone else's work. there are some people who say that, because this issue is so thorny right now, in our world of unstable documents, that any text that you want to quote should just be included in your own document. it's easy enough with copy-and-paste. ("what if you want to cite a whole article or book?", you ask. then a system based on _search_ won't work for you anyway, which is part of the problem with specifying such a system...) -bowerbird

11-08-2007, 08:29 PM	#85
nekokami fruminous edugeek Posts: 6,745 Karma: 551260 Join Date: Oct 2006 Location: Northeast US Device: iPad, eBw 1150	I think it works best if you have an actual website to search for. I've managed to use it to dig up all kinds of pages that have disappeared over the years.

11-09-2007, 12:03 AM	#87
Panurge Enthusiast Posts: 34 Karma: 336 Join Date: Dec 2006 Location: Texas Device: Sony Reader	DaleDe: "I think paragraphs work for everything. After all a Stanza of poetry is really a paragraph in effect. The idea is that the logical unit of the author is the paragraph. It is a cohesive thought and can often be determined easily even when the editions change. Paperback vs. hardback makes page number useless again." Unfortunately, much poetry is not in stanzas, especially when it is written in blank verse.

11-09-2007, 12:10 AM	#88
kovidgoyal creator of calibre Posts: 45,618 Karma: 28549044 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Well there's no real reason why you cant have paragraphs that are a single line long for blank verse.

Advert

Advert