Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 11-05-2007, 02:04 AM   #76
bowerbird
Banned
bowerbird has been very, very naughtybowerbird has been very, very naughtybowerbird has been very, very naughty
 
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
concrete points? i guess i missed them. at any rate, the proof is in the pudding.
if you're right, my library won't work. so there's no point to any discussion here.

so, as they say, have a nice day... :+)

-bowerbird
bowerbird is offline  
Old 11-05-2007, 02:30 AM   #77
bowerbird
Banned
bowerbird has been very, very naughtybowerbird has been very, very naughtybowerbird has been very, very naughty
 
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
gee, it doesn't appear i've posted all the messages
that i've written. nonetheless, i'm sure it will seem
like i didn't address the "concrete points" anyway.

still, i'll send those messages some time.
maybe tomorrow. maybe the day after...
but we did enough back-and-forth today.

-bowerbird
bowerbird is offline  
Old 11-05-2007, 02:32 AM   #78
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,251
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
It feels nice to win an argument. You do bring out the child in me :-)
kovidgoyal is offline  
Old 11-05-2007, 02:36 AM   #79
bowerbird
Banned
bowerbird has been very, very naughtybowerbird has been very, very naughtybowerbird has been very, very naughty
 
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
i'm glad you feel that you won.

maybe it'll mean you back off...

-bowerbird
bowerbird is offline  
Old 11-05-2007, 03:26 AM   #80
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,251
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I actually meant that as an explanation for why I was being so insistent, not a declaration of victory. I'm still looking forward to what you have to say in response to my last post.
kovidgoyal is offline  
Old 11-05-2007, 11:48 PM   #81
Panurge
Enthusiast
Panurge has a complete set of Star Wars action figures.Panurge has a complete set of Star Wars action figures.Panurge has a complete set of Star Wars action figures.Panurge has a complete set of Star Wars action figures.
 
Panurge's Avatar
 
Posts: 34
Karma: 336
Join Date: Dec 2006
Location: Texas
Device: Sony Reader
[For XHTML markup, one thing that comes to mind (just off the top of my head) would be to enclose all the text that makes up an original page with a surrounding tag that uses the "id" attribute to hold the page number. This would not display, but could be accessed if needed. Also, by using "id", you could construct a special hyperlinked table of pages that would allow you to jump to specific pages in the ebook. I'll have to try this and see how it works.]

Some such solution might satisfy everyone. Current scholarly journal databases such as Project Muse give the page numbers in square brackets within the text--an "ugly" solution, I suppose, but a simple one. JSTOR, the dominant archive of scholarly journals takes a different tack. It uses searchable PDF files and presents a scanned graphic representation of the original journal page, so the pagination problem is not an issue. However, the downloaded PDFs don't look all that great on the Sony Reader, though they are usable.
Sorry to have caught up with the conversation so late; I don't get a chance to log on to the forums every day.
Panurge is offline  
Old 11-06-2007, 12:41 PM   #82
jbenny
Addict
jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.
 
Posts: 323
Karma: 358
Join Date: May 2007
Device: Tablet PC and Nokia N800
Quote:
Originally Posted by Panurge View Post
Current scholarly journal databases such as Project Muse give the page numbers in square brackets within the text--an "ugly" solution, I suppose, but a simple one. JSTOR, the dominant archive of scholarly journals takes a different tack. It uses searchable PDF files and presents a scanned graphic representation of the original journal page, so the pagination problem is not an issue. However, the downloaded PDFs don't look all that great on the Sony Reader, though they are usable.
Although neither is ideal, both methods could easily be done in an epub ebook. The first would be very simple, but "ugly" as you say. Including a scanned image of each page (PDF, PNG, JPG, etc.) that is linked from the XHTML text is also possible. This would of course make the epub much larger and more work to construct.

I haven't had the time to think about other ways to do this, but there is probably a good way to do this strictly in XHTML, without having to include scans or put visible page numbers in the text. Perhaps someone else can suggest something?

BTW, this may be a good topic to split out into its own thread.

Edit: Nevermind. I'll create a new topic for it myself.

Last edited by jbenny; 11-06-2007 at 02:13 PM.
jbenny is offline  
Old 11-06-2007, 12:44 PM   #83
bowerbird
Banned
bowerbird has been very, very naughtybowerbird has been very, very naughtybowerbird has been very, very naughty
 
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
panurge, great to have you back. i was worried that
the temperature in here had driven you away... :+)

at any rate, i wrote another message on pagenumbers,
and will go dig it up to post it shortly...

in the meantime, here is a quick summary of various
projects of mine -- in various states of polish -- which
are available in some form online or by-request...

perhaps this will give people an idea of my scope...

i invite the skeptics to go find the flaws in my work,
and report them in great detail... ;+)

-bowerbird

================================================== ====
the proof is in the pudding.
================================================== ====

for the latest version of this pudding sampler at any time, please visit:
> http://z-m-l.com/go/pudding_sampler.html

================================================== ====
the z.m.l. tool-chain is now starting to cohere across the workflow,
so here's a reminder about the pudding samples available currently.
all of these are in-progress, so constructive criticism is welcomed...
================================================== ====

babelfish -- prototype web-app viewer-program for z.m.l.
> http://z-m-l.com/go/babelfish19.pl

verylovely -- canned online zml-to-html conversion demo
> http://www.z-m-l.com/go/vl3.pl

zmldingus -- live online zml-to-html conversion app
> http://www.z-m-l.com/go/zmldingus093.pl

"continuous proofreading" mode: various sample books
> http://z-m-l.com/go/myant/myantp001.html
> http://z-m-l.com/go/mabie/mabiep001.html
> http://z-m-l.com/go/tolbk/tolbkp001.html
> http://z-m-l.com/go/sgfhb/sgfhbp001.html
> http://z-m-l.com/go/ahmmw/ahmmwp001.html
> http://z-m-l.com/go/goann/goannc001.html

.pdf samples -- sample of the zml-to-pdf conversion process
> http://z-m-l.com/oyayr/oyayr.zml
> http://z-m-l.com/oyayr/oya-sunday.pdf
> http://snowy.arsc.alaska.edu/bowerbi...01/alice01.zml
> http://snowy.arsc.alaska.edu/bowerbi...1/alice01b.pdf

.html samples -- sample of the zml-to-html conversion process
> http://snowy.arsc.alaska.edu/bowerbi...01/alice01.zml
> http://snowy.arsc.alaska.edu/bowerbi...1/alice01.html

show_scan-set -- web-viewer modified specifically for viewing otherwise-raw scan-sets
> http://z-m-l.com/go/sss.pl

iphone -- web-viewer modified specifically for the iphone
> http://z-m-l.com/go/babelfishi20.pl

iphone -- reading a scan-set (e.g., page images) on the iphone
> http://z-m-l.com/go/babelfishi20.pl

give -- cross-platform offline viewer-program for z.m.l. (dated now, but...)
> download from the "zml-talk" group at yahoogroups

zandbox -- cross-platform offline z.m.l. authoring-tool
> e-mail me for a copy

banana cream -- cross-platform offline proofreading engine
> e-mail me for a copy

scrape/clean -- cross-platform offline proofreading engine
> e-mail me for a copy

-bowerbird
================================================== ====
the proof is in the pudding.
================================================== ====
bowerbird is offline  
Old 11-06-2007, 01:05 PM   #84
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,409
Karma: 145491800
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
How is ZML useful to get a ZML marked up text into LRF and PRC formats so we can read them on our 505s and Gen3s/iLiads?
JSWolf is offline  
Old 11-06-2007, 01:42 PM   #85
bowerbird
Banned
bowerbird has been very, very naughtybowerbird has been very, very naughtybowerbird has been very, very naughty
 
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
jon, right now, it's not. very shortly, however, the .html conversion will be
solid enough for you to use as the rosetta-stone to leapfrog to other formats.

-bowerbird
bowerbird is offline  
Old 11-06-2007, 02:17 PM   #86
bowerbird
Banned
bowerbird has been very, very naughtybowerbird has been very, very naughtybowerbird has been very, very naughty
 
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
jbenny said:
> You bring up a very valid point that most of us don't think of
> (me included). Can you suggest a way to handle this
> without having the page numbers in-line with the text?
> Most of us would find the visible page numbers too obnoxious.
> For XHTML markup, one thing that comes to mind
> (just off the top of my head) would be to enclose
> all the text that makes up an original page with
> a surrounding tag that uses the "id" attribute
> to hold the page number

i admire the initiative that makes you jump in on this
problem that you haven't really thought about before.

a 3.2k lorem ipsum example isn't really needed, though.

many other people _have_ thought about it, for a while,
so a little exploratory research can go a long way here...
as they've already made a pass at providing solutions...

i've described mine -- and will repeat the links here --
> http://z-m-l.com/go/myant/myantp001.html
> http://z-m-l.com/go/mabie/mabiep001.html
> http://z-m-l.com/go/sgfhb/sgfhbp001.html
> http://z-m-l.com/go/tolbk/tolbkp001.html
> http://z-m-l.com/go/goann/goannp001.html
these demo e-books let you link directly to _one_page_,
where the text is available in easily-copied digital form,
and the page-scan is presented for reference as well...
a comment-form at the bottom lets people report errors,
or even make annotations to the page for others to see...

and again, these are all being done with my .zml format.
you can view the .zml files underlying the above books:
> http://z-m-l.com/go/myant/myant.zml
> http://z-m-l.com/go/mabie/mabie.zml
> http://z-m-l.com/go/sgfhb/sgfhb.zml
> http://z-m-l.com/go/tolbk/tolbk.zml
> http://z-m-l.com/go/goann/goann.zml

so, in spite of the people who would like to convince you
otherwise, here's some pudding as proof that light-markup
is quite capable of generating an e-book that works well...

so that's _my_ particular take on pagenumber retention...

***

i can point to other work too, and i am happy to do so...

i might as well start at the top, with la creme de la creme.

jose menedez has created "digital reprints" which _rock_.

you can download one here:
> http://www.ibiblio.org/ebooks/Einste...Relativity.pdf

that .pdf might _look_ unremarkable, upon first viewing,
but you'll find that the pagenumbers are actually _links_
that will open up the _page-scan_ for that specific page.

originally they opened up the exact page in the scan-set
at google, but it seems google changed their interface,
and now jose's nice links merely go to the first page.
there's a lesson there against depending on other sites...

so, as a more convenient option, you can use my scans.
using the number actually printed on the original page,
plug it into the following u.r.l. template to see the scan:
> http://z-m-l.com/go/einst/einstp001.jpg
in place of the "001", put the page you want. for example:
> http://z-m-l.com/go/einst/einstp089.jpg
will pull up the page-scan for page 89 from the p-book...

if you closely examine any page-scan, you'll observe that
jose's .pdf page is a very accurate replica of that page-scan.
the linebreaks are retained, down to end-line hyphenates.
the leading is almost exactly the same. so are the margins.
jose is an obsessive-compulsive guy; he gets the details right.

here's another digital reprint, this time geronimo's life story:
> http://www.ibiblio.org/ebooks/Geronimo/GerStory.pdf
compare any .pdf page with its scan by using this template:
> http://z-m-l.com/go/geron/geronp001.jpg
(as before, replace "001" with the page-number you want.)
by the way, google's scan-set from this book is the _worst_
job of scanning a book that i have ever seen from them...
it's worth downloading just for its humor as a bad example.

and finally, here's a third from jose, willa cather's "my antonia":
> http://www.ibiblio.org/ebooks/Cather...ia/Antonia.pdf
again, you can see the pagescan for any page on my site:
> http://z-m-l.com/go/myant/myantp001.jpg
(as before, replace "001" with the page-number you want.)

for the first two digital reprints, you can step through the
scan-sets more easily using my "show scan-set" viewer:
> http://z-m-l.com/go/sss.pl
"geronimo's story" is the one that comes up by default,
but you can choose the einstein book or the cather book
with the book-selection menu you will find on the page...
(and "my antonia" was also listed above in my examples.)

the quality of each of jose's "digital reprints", as a reprint,
is fantastic. you immediately see the pages are immensely
cleaner than the scans of those old library books, some of
which were subjected to careless markings by borrowers
who evidently were never taught to respect library books.
(then again, i guess that, over the course of 100 years,
there's gonna be _one_ borrower who simply _forgets_
that this was a library book, and not one of his own books.)

jose's tremendous quality gets _more_ remarkable as we
realize the digital reprint -- as opposed to the scan-set --
is _digital_text_, and thus can be _searched_ and _copied_,
meaning that it's infinitely more flexible than the scan-set.

and this all becomes truly mind-boggling when you further
realize the .pdf is 10-30 times _smaller_ than the scan-set,
which means it will run faster and use far fewer resources...

and yes, it takes some work to convert a scan-set into
digital text -- o.c.r. and proofing and formatting -- but
considering the huge benefits that result, it's worth it.

this, truly, is the direction our digital library should follow...

store a copy of the scans online, so people can refer to 'em,
to confirm for themselves that the digitization was accurate.
but give them, for their actual use, a file that's _digital_text_
-- for maximal convenience in our 21st-century cyberspace --
yet is capable of _replicating_ the original p-book _exactly_,
for the scholar-valued touchstone with previous centuries...

(that doesn't mean we have to _leave_ it in that form; we can
always remix it to our customization if we want to, since that
_remixing_ is part of the magic of a _digital_text_... but still,
we know if we want to replicate the p-book exactly, we can.
and there are times when we _do_ want exact replications...
it makes it much easier to know we're all on the same page.
sorry, but i can't ever resist throwing in that good old cliche.)

indeed, the biggest thing wrong with jose's digital reprints
is the reliance on .pdf, which is the "roach motel" of formats.
(that is, documents can go in, but they cannot come out...)

another problem is that jose builds his files using ms-word,
and doesn't make that original file available for us to remix.

in spite of these faults, though, jose's work is outstanding...

(and, just to connect the dots for you, my z.m.l. work is
designed to give the benefits while overcoming the faults.)

***

there's been other work done on retaining pagenumbers too.

here's yet another version of our good old standby, "my antonia",
which uses an x.m.l. approach to store pagenumber information:
> http://www.openreader.org/myantonia/...myantonia.html

by the way, this is the strategy that led me to make point #14
about not putting pagenumbers in-line inside the body-text...

but, on the _positive_ side, note that this document also
allows a person to click out to view each scan for reference.

also of interest, although i'd hope this degree of markup
becomes unnecessary in the future, with better browsers,
observe that each paragraph has its own "i.d." reference,
thus allowing a link to be made to a specific _paragraph_...

(should we next expect an i.d. reference on every _word_?)

***

and last but not least, because they've actually done _the_most_
work on retaining pagenumber information, you need to look at
the .html versions of the books _distributed_proofreaders_ does
for project gutenberg. over the course of the last couple of years,
most of the postprocessors there have moved to the position that
they believe pagenumbers _should_ be both saved and displayed,
so nearly all of the .html versions posted to p.g. lately have them...

unfortunately, the p.g. version of "my antonia" does not have an
.html version -- sad, the absence of automatic conversion, eh?,
perhaps someone could use gutenmark to make one for them --
so we can't compare their version of it straight across the board...

so let's take p.g. e-text #22222, as a demo, to pick a fun number:
> http://www.gutenberg.org/files/22222...-h/22222-h.htm

you'll see that, yes indeed, they've retained the pagenumber info.
and, unlike the x.m.l. example above, they have used their c.s.s.
to move the pagenumber out into the margin, and turned it gray,
so it's less conspicuous and distracting. so those are good moves.

moreover, if you really want a very good idea of exactly where the
pagebreak occurred, you can drag your cursor across the line and
observe exactly where in the line the pagenumber gets highlighted.
for example, if you scroll down to page 20, and do this little trick,
you'll find the pagebreak occurs between "practitioners" and "is".

(you could "view source" if you want, of course, but that's clumsy.)

what that _means_ is that -- in spite of where it is being displayed --
the pagenumber actually exists in-line, right in the body of the text.

unfortunately, what _that_ means is that, when you _copy_ the text,
the pagenumbers are mixed in, which we already said is a bad thing.

for instance, if you copy out the text around pagebreak 20, you get:
> and although applied to all graduate medical practitioners [20]is,
> in all other realms of learning, a degree awarded for graduate work
eewh! see that pagenumber in the middle? that's not what we want!

however, the problem isn't limited to a hassle when doing remixing.
these pagenumbers intermingled in the actual body-text can _also_
cause problems when the end-user performs a _search_ on the text.

so, for instance, if you do a search for "practitioners is", you will _not_
get a hit on that sentence that straddles page 20, because there is a
pagenumber between those two words.

(ironically, if you search for "practitioners [20]is", you _do_ get a hit;
but of course if you knew that that text is at pagebreak 20, then you
didn't need to search for it, did you? you'd just go right to page 20.)

i googled to see if a search on "practitioners is" would
bring up the .html version of e-text #22222. it didn't.
but more experimentation revealed that i couldn't do
_anything_ to fetch the .html version. the .txt version
came up just fine. but no search would find the .html...
so that's a mystery to me...

these twin usability problems aren't _showstoppers_, but they _are_
"glitches" that should be cleared up, if someone has an idea _how_...
if you are that someone, hustle over to d.p. and help them out, ok?

***

at any rate, here we have some ways to give scholars pagenumbers...

if you have any feedback on any of these systems, i'd love to hear it...

-bowerbird
bowerbird is offline  
Old 11-06-2007, 02:28 PM   #87
bowerbird
Banned
bowerbird has been very, very naughtybowerbird has been very, very naughtybowerbird has been very, very naughty
 
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
in that x.m.l.-based version of "my antonia" i discussed above,
i forgot to provide an example of a link direct to a paragraph.

here's one:
> http://www.openreader.org/myantonia/...nia.html#p0251
you should read the paragraph directly after that one as well...

-bowerbird
bowerbird is offline  
Old 11-06-2007, 02:32 PM   #88
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,251
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by bowerbird View Post
jbenny said:
so, in spite of the people who would like to convince you
otherwise, here's some pudding as proof that light-markup
is quite capable of generating an e-book that works well...
Nobody says that lightweight markup cannot generate *an* ebook that "works well". The question is whether lightweight markup is suitable for *all* ebooks. A question you have still failed to address.
kovidgoyal is offline  
Old 11-06-2007, 02:40 PM   #89
bowerbird
Banned
bowerbird has been very, very naughtybowerbird has been very, very naughtybowerbird has been very, very naughty
 
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
i expect to handle 99% of the books in the p.g. library.

and handle them well. indeed, i expect my viewer-app
will give performance that is surpassed by _no_ others,
and which is _far_superior_ to most... of course, i also
hope those other viewers improve, to the point where
they are no longer surpassed by my app, or any other.
the world of e-books only suffers when viewers are bad...

-bowerbird
bowerbird is offline  
Old 11-06-2007, 02:44 PM   #90
bowerbird
Banned
bowerbird has been very, very naughtybowerbird has been very, very naughtybowerbird has been very, very naughty
 
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
kovidgoyal, i have substantial replies to your previous posts,
which i would like to post, but i don't want to _monopolize_
the conversation here. i'd like to give other people a chance.
when two people overtake a thread, it can get boring fast...

so if you resist the urge to address every point right away,
it would be good. i promise you'll have lots of chances later.

-bowerbird
bowerbird is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
The "Closed Circle" is open for business pholy Deals and Resources (No Self-Promotion or Affiliate Links) 0 12-20-2009 09:24 PM
"SuperBook" project - British School studies e-books usage TadW News 2 06-28-2007 10:46 PM
Introducing the book: Gutenberg offers "in-home" tech support (humor) nekokami Lounge 1 05-07-2007 08:40 PM
"Gutenberg 2.0: le futur du livre" / iRex demoes Mobipocket on iLiad Hadrien News 4 03-27-2007 11:45 AM


All times are GMT -4. The time now is 12:20 PM.


MobileRead.com is a privately owned, operated and funded community.