MobileRead Forums - View Single Post - Page numbers in ebooks for scholarly research?

bowerbird · 11-07-2007, 04:23 PM

sartori said:
> Bowerbird, can i ask the reasoning behind
> splitting the document into individual pages?

first of all, my e-books _can_ exist in several forms.
the individual-pages form is just one of those forms.
but i can (and do) spin out "whole-book" forms too...
(plus chapter-by-chapter forms, for some purposes.)

i pointed to the page-by-page form because this topic
-- page-based referencing within scholarly situations --
is one whose basic requirements call out for that form...

to see the "master file" for "my antonia", look here:
> http://z-m-l.com/go/myant/myant.zml
(as you see, the master itself is in whole-book form.)

that "master" _generated_ the page-by-page form...

the page-by-page form has many intended purposes.

its first major purpose is to facilitate _proofreading_...
you want to do proofreading on a page-by-page basis;
you want the page-scan to be shown alongside the text;
and you want the text to contain the original linebreaks.
this format is geared toward those proofreading needs...
(this is a "final-stage" proofing interface, where errors are
"reported", because there are very few. for earlier stages
of proofreading, where there might be many more errors,
we'll want an interface that lets us fix them more directly.)

the next major purpose of it is for _confirming_accuracy_.
we want to give people an ability to confirm our digitization,
to satisfy themselves we did that conversion job correctly...
to do so, we show them our text and the original page-scan,
so they can do a direct comparison and see for themselves...

the third major purpose is the one we're discussing here --
the ability for people to make a pointer to a specific page...
and -- as i have said -- the reason we need to facilitate that
is because our culture heritage is full of page-based pointers.
and again, we _could_ point them to a place with just the text,
but everyone knows that text can be easily "edited", so we also
put the original page-scan up so as to increase the trust factor.
(of course, scans could _also_ be doctored, but at some point,
there's only so much you can do.)

> Couldn't you point to the page content using
> http://z-m-l.com/go/myant/myantp.html#189
> as opposed to
> http://z-m-l.com/go/myant/myantp189.html.

sure.

and sometimes that's what you'll want to do instead.

but let me show you something. stopwatch this link:
> http://z-m-l.com/go/myant/myantp189.html

now check the length of time it takes to go to this one:
> http://www.openreader.org/myantonia/...a.html#page189

unless that second page was already in your cache or
you have a superfast connection, it took _lots_ longer
to load, because you're loading in some 500k of text
-- the whole book -- instead of 1k of text and a scan.
(for the dialup users, the second file will be _painful_.)

so it depends on what you need your readers to load...
if you only need them to load one page of text, do that.
if you need them to load the whole book, then do _that_.

you'll notice that the second link doesn't include the scans
in-line in the file; you have to click a link to view each one.
(the scans run to 30 megs, so it'd be suicide to load 'em all.)

so it depends on what you need.

if you wanted to point to one page in each of 50 books,
you wouldn't want to force your reader to load each of the
50 books in full just to see that one page. and this is often
the essence of a scholarly reference section. so it depends.

this is why we need the flexibility to quickly and easily
auto-generate whatever format is needed at the time...

> That way the whole content of the book is in one file
> and conversion to other formats would be easier.

in sum, i pointed to a page-based form because of this discussion...
i can also create book-based forms when _that_ is more appropriate.

(such flexibility is one reason i invented my z.m.l. format, which is
a sidetrack topic in that other thread from which this one came...)

> For example how do you recognize when a paragraph splits across
> two pages and how do you join them back together when converting?

good question. but easy answer.

in a "master" file which has pagebreaks marked, like this one:
> http://z-m-l.com/go/myant/myant.zml
the formula for generating a version _without_ the pagebreak info is to:
1. delete the _one_ blank line _above_ the [[doublebracketed]] pagenumber,
and delete the _one_ blank line _below_ the {{doublebraced}} scan-filename...
2. if there were _two_ blank lines above and below, respectively, then that
was a paragraph break, so you should insert a blank line in the output file.

if you follow that rule, you'll find that paragraphs which cross pagebreaks
get joined together, while the ones that ended on the pagebreak still do...
for instance, in the .zml master, compare the breaks between these pages:
> http://z-m-l.com/go/myant/myantp040.html
> http://z-m-l.com/go/myant/myantp041.html
versus:
> http://z-m-l.com/go/myant/myantp061.html
> http://z-m-l.com/go/myant/myantp062.html

see how easy it was for me to point you to those pages specifically?
and also the _usefulness_ of being able to see both text _and_ scan?

-bowerbird

11-07-2007, 04:23 PM	#68
bowerbird Banned Posts: 269 Karma: -273 Join Date: Sep 2006 Location: los angeles	sartori said: > Bowerbird, can i ask the reasoning behind > splitting the document into individual pages? first of all, my e-books _can_ exist in several forms. the individual-pages form is just one of those forms. but i can (and do) spin out "whole-book" forms too... (plus chapter-by-chapter forms, for some purposes.) i pointed to the page-by-page form because this topic -- page-based referencing within scholarly situations -- is one whose basic requirements call out for that form... to see the "master file" for "my antonia", look here: > http://z-m-l.com/go/myant/myant.zml (as you see, the master itself is in whole-book form.) that "master" _generated_ the page-by-page form... the page-by-page form has many intended purposes. its first major purpose is to facilitate _proofreading_... you want to do proofreading on a page-by-page basis; you want the page-scan to be shown alongside the text; and you want the text to contain the original linebreaks. this format is geared toward those proofreading needs... (this is a "final-stage" proofing interface, where errors are "reported", because there are very few. for earlier stages of proofreading, where there might be many more errors, we'll want an interface that lets us fix them more directly.) the next major purpose of it is for _confirming_accuracy_. we want to give people an ability to confirm our digitization, to satisfy themselves we did that conversion job correctly... to do so, we show them our text and the original page-scan, so they can do a direct comparison and see for themselves... the third major purpose is the one we're discussing here -- the ability for people to make a pointer to a specific page... and -- as i have said -- the reason we need to facilitate that is because our culture heritage is full of page-based pointers. and again, we _could_ point them to a place with just the text, but everyone knows that text can be easily "edited", so we also put the original page-scan up so as to increase the trust factor. (of course, scans could _also_ be doctored, but at some point, there's only so much you can do.) > Couldn't you point to the page content using > http://z-m-l.com/go/myant/myantp.html#189 > as opposed to > http://z-m-l.com/go/myant/myantp189.html. sure. and sometimes that's what you'll want to do instead. but let me show you something. stopwatch this link: > http://z-m-l.com/go/myant/myantp189.html now check the length of time it takes to go to this one: > http://www.openreader.org/myantonia/...a.html#page189 unless that second page was already in your cache or you have a superfast connection, it took _lots_ longer to load, because you're loading in some 500k of text -- the whole book -- instead of 1k of text and a scan. (for the dialup users, the second file will be _painful_.) so it depends on what you need your readers to load... if you only need them to load one page of text, do that. if you need them to load the whole book, then do _that_. you'll notice that the second link doesn't include the scans in-line in the file; you have to click a link to view each one. (the scans run to 30 megs, so it'd be suicide to load 'em all.) so it depends on what you need. if you wanted to point to one page in each of 50 books, you wouldn't want to force your reader to load each of the 50 books in full just to see that one page. and this is often the essence of a scholarly reference section. so it depends. this is why we need the flexibility to quickly and easily auto-generate whatever format is needed at the time... > That way the whole content of the book is in one file > and conversion to other formats would be easier. in sum, i pointed to a page-based form because of this discussion... i can also create book-based forms when _that_ is more appropriate. (such flexibility is one reason i invented my z.m.l. format, which is a sidetrack topic in that other thread from which this one came...) > For example how do you recognize when a paragraph splits across > two pages and how do you join them back together when converting? good question. but easy answer. in a "master" file which has pagebreaks marked, like this one: > http://z-m-l.com/go/myant/myant.zml the formula for generating a version _without_ the pagebreak info is to: 1. delete the _one_ blank line _above_ the [[doublebracketed]] pagenumber, and delete the _one_ blank line _below_ the {{doublebraced}} scan-filename... 2. if there were _two_ blank lines above and below, respectively, then that was a paragraph break, so you should insert a blank line in the output file. if you follow that rule, you'll find that paragraphs which cross pagebreaks get joined together, while the ones that ended on the pagebreak still do... for instance, in the .zml master, compare the breaks between these pages: > http://z-m-l.com/go/myant/myantp040.html > http://z-m-l.com/go/myant/myantp041.html versus: > http://z-m-l.com/go/myant/myantp061.html > http://z-m-l.com/go/myant/myantp062.html see how easy it was for me to point you to those pages specifically? and also the _usefulness_ of being able to see both text _and_ scan? -bowerbird