![]() |
#91 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,233
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The problem is that you simply cannot reproduce a static layout faithfully in a reflowable format that will work at different sizes. They are two different things. I think we just have to accept that.
|
![]() |
![]() |
![]() |
#92 |
Connoisseur
![]() Posts: 54
Karma: 29
Join Date: Oct 2006
|
I've been thinking about the issue of identifying which version of a document you may be looking at while researching. For example:
Say I quote chapter 3, paragraph 11 from a book listed on site 1 that is listed as Alice In Wonderland.epub. Somebody looking at my work decides to lookup the quote from a document called Alice In Wonderland.epub on site 2. The only problem is site 2 has marked the paragraph starting point incorrectly so my reference makes no sense. I have read that each epub document (and probably most others) require an ID number. Could this ID number be a 10 digit checksum generated from the actual content of the html source? That way, even if one character is changed in the source the checksum would change. Then when I reference my quote it could be something like Chapter 3, Paragraph 11 - Alice In Wonderland.epub [5684937643]. It should be pretty easy to create a tool that would verify the checksum I typed. Now I could verify any document as being the same one originally referenced no matter where the file was obtained. Edit: Of course this does nothing to help verify that the document I quoted from was correct in the first place. Rob Last edited by sartori; 11-09-2007 at 01:29 AM. |
![]() |
![]() |
Advert | |
|
![]() |
#93 | |
Addict
![]() ![]() ![]() ![]() Posts: 323
Karma: 358
Join Date: May 2007
Device: Tablet PC and Nokia N800
|
Quote:
<dc:identifier id="BookID">urn:uuid:xxxxxxxxxxxxxxxxxxx</dc:identifier> Note that the identifier is required to be unique, such that no other epub should have the same ID.For a commercial ebook, the identifier would be the ISBN. For ebooks without an assigned ISBN, some other means of identifying the ebook is needed. Unless I missed it in the OPS specification, I don't see that it recommends any particular method. However, a UUID (GUID) seems to be the most logical solution, as discussed elsewhere on this forum (and the format of the above statement even implies the use of a UUID). Feedbooks is using a UUID for epubs, according to Hadrien. Assuming that a new ISBN or UUID is used whenever an edited or updated version of the original epub is created, this would take care of identifying a particular edition. The identifier would seem to preclude using it as a checksum, due to the need for uniqueness. However, one of the optional metadata fields may be useable for such use. In fact, I don't see anything that says you can't use your own unique metadata element for this purpose. Of course, getting everyone to use such a method is another issue. Adding a checksum (or better, a hash) would be a useful addition to the epub specification. You could certainly use it to verify that the contents haven't changed, as you suggested. Again, this may not be important for the casual reader, but people need to think about and find ways to accomodate ebook use by the academic community as well. |
|
![]() |
![]() |
![]() |
#94 |
Banned
![]() ![]() ![]() Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
|
panurge said:
> I don't think a representation of > the print copy in its visual layout is necessary, > just an exact transcription of the content. well, i can understand a desire for that kind of product. but i have absolutely no interest in making such a thing. _my_ target is the human reader in the 21st century, so i see my job as bringing that old p-book into cyberspace. which can mean making a million little changes, and it's not a good use of my time to keep track of all of them... i mean, if you wanted to _pay_ me to do that job for you, then i might consider it. (or not, since it'd be too boring.) but i'm certainly not gonna use my volunteer time to do it, because none of my target (readers) care about that stuff. so you're just adding (immensely) to the _cost_ of the thing, without providing any _benefit_ to my audience. so no deal. -bowerbird |
![]() |
![]() |
![]() |
#95 |
Banned
![]() ![]() ![]() Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
|
panurge said:
> Now I could verify any document as being the same one > originally referenced no matter where the file was obtained. why not just point to a stable u.r.l. with the document you referenced, so there is no need to jump through all of these verification hoops? consider that you have to give such a u.r.l. to people _anyway_, for those people who don't have a copy of the document to begin with... -bowerbird |
![]() |
![]() |
Advert | |
|
![]() |
#96 |
fruminous edugeek
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,745
Karma: 551260
Join Date: Oct 2006
Location: Northeast US
Device: iPad, eBw 1150
|
I agree with sartori and jbenny, adding a checksum/hash to the epub standard would be helpful. How would we formally suggest that to the committee?
|
![]() |
![]() |
![]() |
#97 |
Banned
![]() ![]() ![]() Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
|
nekokami said:
> How would we formally suggest that to the committee? i'm sure they have an e-mail address. but you might want to spend just a _little_ bit of time finding out how tenable your constructions really are, and what that committee has already done, and the research that's been performed in this field up to now -- you know, just _educating_yourself_ on the topic -- before you consider making any "formal suggestions". or maybe you won't want to do that, i don't know... i'm not the boss of you, so i don't tell you what to do. -bowerbird |
![]() |
![]() |
![]() |
#98 |
Addict
![]() ![]() ![]() ![]() Posts: 323
Karma: 358
Join Date: May 2007
Device: Tablet PC and Nokia N800
|
|
![]() |
![]() |
![]() |
#99 | |
Fully Converged
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 18,171
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
|
Quote:
This is the only time I am going to say it, and at this point it's not a subject for debate: We do not tolerate personal attacks, flaming, disruptive behavior, or even insensitive remarks. You may have a different sense of what should be regarded as insensitive, and that's your personal right, but if you want to continue participating in our forums, I ask you to change your attitude and maintain the spirit of mutual respect within our welcoming and cordial community. This has never been an issue at MobileRead before. I am tired of receiving messages from users who feel irritated and offended by your constantly denigrating behavior. It's up to you. |
|
![]() |
![]() |
![]() |
#100 |
Banned
![]() ![]() ![]() Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
|
alexander-
i have one question. it's a serious question. i would genuinely appreciate an answer from you. are there really people here who consider a suggestion to "educate yourself" to be a "personal attack" or "flaming" or "disruptive behavior" or even "insensitive", people who send you messages saying they "feel irritated and offended" by that? because, you know, maybe i'm in the wrong place. -bowerbird |
![]() |
![]() |
![]() |
#101 |
Fully Converged
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 18,171
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
|
There is a distinct difference between what you say and how you say it. If you don't understand this, then you may be right with your assessment that MobileRead isn't the right place.
|
![]() |
![]() |
![]() |
#102 |
Banned
![]() ![]() ![]() Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
|
alexander-
since only the words themselves -- the "what" -- are there, how can you -- or anyone -- tell me "how" i am saying them? and when i inform you -- quite directly, with no uncertainty -- that i am _not_ "attacking" or "flaming" or "disrupting" or even "being insensitive", do you mean to tell me that you understand my internal motivations better than i do, and that i'm _wrong_, because i really do intend to be doing all those negative things? do you really believe you can honestly say you know me so well? my messages are rorschach blots. each person needs to take responsibility for the way that _you_ interpret them. i am a gentle soul who writes posts from a good heart, with a soft sweet voice, and the very best of intentions. i speak the truth as i see it, because people deserve it. i always stay as cool as a cucumber, never get heated, and am willing -- indeed _proud_ -- to "own my words" -- both here and now, and for the many decades to come. i've never written a single post, anywhere, i'd take back... so i'll repeat that, because i want it to sink in: i am a gentle soul who writes posts from a good heart, with a soft sweet voice, and the very best of intentions. although i cannot force people to interpret them that way, i can -- and _do_ -- take umbrage when they attempt to force their own interpretation as being reflective of _me_. because -- from my perspective -- that's extremely rude. it's dishonest, and disrespectful of me, as a human being. so yeah, maybe i _am_ in the wrong place... -bowerbird |
![]() |
![]() |
![]() |
#103 |
Fully Converged
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 18,171
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
|
That's it. Already too many threads got side-tracked by talking about you rather than about the subject of the thread. Either stay and change your attitude or find another place where people have a better understanding of what you'd like to achieve.
|
![]() |
![]() |
![]() |
#104 |
Banned
![]() ![]() ![]() Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
|
alexander said:
> Already too many threads got side-tracked by > talking about you rather than about the subject of the thread. i agree with you. -bowerbird |
![]() |
![]() |
![]() |
#105 |
Banned
![]() ![]() ![]() Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
|
panurge said:
> Page numbers are simply a way of keeping track of pages. > The earliest printed books don't have them. For incunabulae, > the books published in the second half of the 15th century, > there were numbers, not of pages but of groups of pages, > so that when the book was put together for binding the sections > would not be out of order. Manuscripts may or may not have > page numbers. Sometimes the first word of the following page > was printed (or written) at the bottom of the preceding page > to establish sequence. i responded to this earlier by saying that my guess would be that 99.98% of the 7 million volumes that google scans at umichigan would have pagenumbers, and that's where we needed to focus... however, i wanted to get back to this, and additionally note that the cyberlibrary of the future will (hopefully!) consist of more than the books that were sitting on library shelves in our universities... there is _so_ much more content out there than just those books... there are pictures, and maps, and genealogical charts, and books on local history that were published in very small runs and didn't spread much farther than a few city libraries down the road, and city council minutes, and local newspapers, and school calendars, and blueprints of buildings, and diagrams of public sewer systems, and aerial photos of the coast and roads and farms and villages, and books of poetry, and correspondence (both public and private), diaries and dirty magazines, and more and more and so much more. in today's e-mail is a notice that the archives of earnest hemingway have been made available, having been donated to the j.f.k. library: > http://www.jfklibrary.org/Historical...ngway+Archive/ also today, a very interesting notice about the scapbooks of suffragettes: > Miller NAWSA Suffrage Scrapbooks, 1897-1911 > http://memory.loc.gov/ammem/collecti...lerscrapbooks/ just two examples of some fascinating aspects of our cultural heritage that are _not_ bound within the pages of the books in our universities. or here's a story about the new library director at harvard university: > http://www.thecrimson.com/article.aspx?ref=520414 he intends to write a book on book-smuggling across the french border during the 18th century, a book that he says was inspired by an archive of some 50,000 unpublished letters that he found in an old swiss town. putting those letters online, so each one of them could be accessed by a person reading this book, is one of the things that becomes possible when we've made the committment to put our cultural heritage online. it's very unclear whether society will have the intelligence to _fund_ the digitization of all these other elements of our cultural heritage. i can't even lie to you and say that i think we will. but nonetheless, we need to make a referencing system that can point to these _other_ things just as efficiently as it points to the _pages_ in library books... however, the system that i've built for those pages in library books also proves to be well-suited for those other purposes, fortunately. if you look closely, you'll see that i built a very tight binding between the _page_ of each book and the _u.r.l._ which "houses" that page... for instance, here's the u.r.l. for page 123 of "my antonia": > http://z-m-l.com/go/myant/myantp123.html ignore the first part -- http://z-m-l.com/go/ -- refering to my site. and the last part -- p123.html -- that's pointing to one specific page. the secret-sauce here lies in the _middle_part_ -- "myant/myant"... that secret-sauce middle-part helps create a unique website address, and that tells us _exactly_ what book we are in, with no uncertainty, since one -- and only one! -- file can be present at this exact u.r.l., so there's no question about "which" version of "my antonia" this is -- it's the version which is located at this webpage. by definition... we will have _other_ versions of "my antonia" -- the second printing of this same edition, or a completely different edition, or a version to which we have added a complex set of annotations, or whatever -- but each of those _other_ versions will have to be at _another_ u.r.l., because the version at _this_ particular u.r.l. is this particular version. so, by pointing to this u.r.l., we're indicating one-and-one-only version, and there's absolutely no ambiguity on which version we're referencing. and since you'll remember that i've stipulated that we have stable u.r.l.'s, there's also no uncertainty that a pointer to this u.r.l. will _always_ point to this specific version, and _only_ it, now and _forever_ into the future... so anyone who wants to know, exactly and explicitly, what you reference can simply go there and see it, immediately. so we're on the same page. (sorry, i can never resist using that one.) ;+) the same logic applies to the other items we have in our stable archive... each one of those 50,000 letters mentioned above would be located at its own page, so we know we can identify and reference every single one, unequivocally and unmistakably. in the example u.r.l. given above -- a page from a book -- it was _vital_ that the u.r.l. give some kind of clue about the contents of what it held... amazingly, a lot of cyberspace libraries get this wrong, wrong, all wrong. not only is there an absence of a bond between the content and its u.r.l., sometimes there is an _inability_ to link a filename to _specific_ content, because different files have actually been given the very same filename! trot on over to "mirlyn", the online library for the university of michigan, for example, and poke around in their electronic-holdings. try here: > http://mdp.lib.umich.edu/cgi/m/mdp/p...659078&seq=123 now, save that image to your hard-drive, and you'll find its name was: > 00000147.tif.100.0.png the "100" and the "0" and the stuff at the end are viewing options. the _meat_ of this filename is this: > 00000147.tif first of all, notice that this page was _actually_ page _123_ in the book. so giving it a filename of "00000147.tif" is... well, it's kind of ridiculous. but it gets worse. much, much worse... explore other books, and you'll see many have a file named "00000147.tif". for instance, here's one, this time from "books and culture", by mabie: > http://mdp.lib.umich.edu/cgi/m/mdp/p...881628&seq=147 (this is actually page 141, so its filename isn't bonded to its content either.) indeed, _every_ book that has more than 147 pages (counting frontmatter) will have a file named "00000147.tif". if you've got lots of books -- and umichigan has, quite literally, _millions_ -- giving a file from each one the name of "00000147.tif" is outright stupid... that means they have to depend on the _folders_ (or _subdirectories_) to tell them apart, which is a very bad accident just waiting to happen. if files get written to the wrong directory, how would you ever know? if a foldername accidently gets changed, how would you ever know? only when some user comes and says, "hey, i was supposed to get "my antonia" by willa cather, but i got "books and culture" by mabie, so what's up with that?" cyberspace libraries need to follow some common-sense rules: 1. a file's name _must_ identify the content it contains, unequivocally. 2. the same file _should_ have the same name, no matter where it is. 3. different files _must_ have different names (cannot have the same name). there's more rules than that, but let's just stick with those for now. but, as we've seen above, even a huge and sophisticated library like the university of michigan cannot get even this simple thing right... it's sad, i tell you, it's really sad... on the bright side, however, once you follow this simple naming convention, all of a sudden every document in your library has a unique name that can be linked to a matching unique u.r.l., and referencing just became very simple... for example, in the case of those 50,000 letters, i might give them names that would correspond to their dates, or maybe their sender or recipient, or some combination. (it's hard to say without first perusing their content.) but whatever the names i gave them, they would then bond to their web u.r.l. and this, ultimately, is the way that you can deal with "unnumbered pages". you _give_ them a number, or a name, and that name becomes their u.r.l. -bowerbird |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Page numbers | Fincary | Astak EZReader | 4 | 02-18-2010 03:06 PM |
page numbers | nenad | Amazon Kindle | 2 | 12-19-2009 09:01 AM |
Professional and scholarly ebooks account for 75% of ebook market? | anurag | News | 1 | 11-26-2009 12:40 PM |
Page numbers, AGAIN | orlincho | Bookeen | 92 | 08-19-2008 07:15 AM |
Page numbers (again) | Prospect | Workshop | 50 | 04-10-2008 02:19 AM |