Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book General > News

Notices

Reply
 
Thread Tools Search this Thread
Old 03-27-2009, 10:33 AM   #16
Hadrien
Feedbooks.com Co-Founder
Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.
 
Hadrien's Avatar
 
Posts: 2,263
Karma: 145123
Join Date: Nov 2006
Location: Paris, France
Device: Sony PRS-t-1/350/300/500/505/600/700, Nexus S, iPad
Quote:
Originally Posted by DaleDe View Post
One good fix would be to adopt Calibre as your conversion tool. It does a good job of producing both ePUB and now mobi as well as several others.

Dale
I disagree: the current metadata in the PG EPUB are much better than what Calibre produces. It would replace one problem by another.
Hadrien is offline   Reply With Quote
Old 03-27-2009, 10:39 AM   #17
Hadrien
Feedbooks.com Co-Founder
Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.
 
Hadrien's Avatar
 
Posts: 2,263
Karma: 145123
Join Date: Nov 2006
Location: Paris, France
Device: Sony PRS-t-1/350/300/500/505/600/700, Nexus S, iPad
For example for the following book uploaded by mtravellerh: https://www.mobileread.com/forums/showthread.php?t=43041
Quote:
<dc:identifier opf:scheme="calibre" id="calibre_id">736</dc:identifier>
This is NOT a unique identifier and while it maybe be ok for files that you convert for yourself, it is pretty catastrophic to distribute files with such an identifier. You might have a lot of books uploaded here on Mobileread all sharing the same identifier because of this problem.
My advice would be to assign a UUID or another unique identifier to every book available in the ePub section to fix this problem.
Hadrien is offline   Reply With Quote
Advert
Old 03-27-2009, 11:15 AM   #18
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by Hadrien View Post
For example for the following book uploaded by mtravellerh: https://www.mobileread.com/forums/showthread.php?t=43041

This is NOT a unique identifier and while it maybe be ok for files that you convert for yourself, it is pretty catastrophic to distribute files with such an identifier. You might have a lot of books uploaded here on Mobileread all sharing the same identifier because of this problem.
My advice would be to assign a UUID or another unique identifier to every book available in the ePub section to fix this problem.
Yea, I had forgotten about the unique id problem. Calibre needs to fix that. I was talking about the format but the id is something to be aware of and it needs to be unique. Of course you can manually edit the file but that is a pain.

Dale
DaleDe is offline   Reply With Quote
Old 03-27-2009, 11:57 AM   #19
Hadrien
Feedbooks.com Co-Founder
Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.
 
Hadrien's Avatar
 
Posts: 2,263
Karma: 145123
Join Date: Nov 2006
Location: Paris, France
Device: Sony PRS-t-1/350/300/500/505/600/700, Nexus S, iPad
Quote:
Originally Posted by DaleDe View Post
Yea, I had forgotten about the unique id problem. Calibre needs to fix that. I was talking about the format but the id is something to be aware of and it needs to be unique. Of course you can manually edit the file but that is a pain.

Dale
It's pretty easy to fix in a next build of Calibre, the real problem is with all those files here on Mobileread. That's why working with end-formats only is such a bad choice: you're stuck with these kind of errors.

There are various other errors that I've noticed, like for example in another recent upload:
Quote:
<dc:language>UND</dc:language>
<dc:creator opf:role="aut" file-as="Georg, Ebers,">Ebers, Georg</dc:creator>
UND probably stands for undefined: once again it can be very problematic, if for example the reading system rely on this value to set hyphenation rules for example.
In the dc:creator field: it should be file-as="Ebers, Georg" and dc:creator set as Georg Ebers instead of this mess.
Hadrien is offline   Reply With Quote
Old 03-27-2009, 11:57 AM   #20
=X=
Wizard
=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.
 
=X='s Avatar
 
Posts: 3,671
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
Quote:
Originally Posted by DaleDe View Post
One good fix would be to adopt Calibre as your conversion tool. It does a good job of producing both ePUB and now mobi as well as several others.

Dale
I have great respect for Kovid and find Calbire an essential tool. But I would strongly disagree for the ePUB side. The ePUB calibre produces are terrible, they're bloated, terribly slow, and often cause the reader to crash.

I always thought this was an issue with ePUB so I steered clear of ePUB. But after downloading some very large ePUB from Google and Feedbooks I realize it was calibre all along.

Now the LIT, MOBI, LRF files that calibre produces are excellent.
=X=
=X= is offline   Reply With Quote
Advert
Old 03-27-2009, 01:54 PM   #21
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by Hadrien View Post
I disagree: the current metadata in the PG EPUB are much better than what Calibre produces. It would replace one problem by another.
What are you talking about?
kovidgoyal is offline   Reply With Quote
Old 03-27-2009, 01:58 PM   #22
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by Hadrien View Post
For example for the following book uploaded by mtravellerh: https://www.mobileread.com/forums/showthread.php?t=43041

This is NOT a unique identifier and while it maybe be ok for files that you convert for yourself, it is pretty catastrophic to distribute files with such an identifier. You might have a lot of books uploaded here on Mobileread all sharing the same identifier because of this problem.
My advice would be to assign a UUID or another unique identifier to every book available in the ePub section to fix this problem.
1. Why is it catastrophic to distribute EPUB books with the same id?
2. If you specify an ISBN in the metadata, calibre will put that in the EPUB file. Having and additional calibre specific identifier is no problem at all.
kovidgoyal is offline   Reply With Quote
Old 03-27-2009, 02:01 PM   #23
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by =X= View Post
I have great respect for Kovid and find Calbire an essential tool. But I would strongly disagree for the ePUB side. The ePUB calibre produces are terrible, they're bloated, terribly slow, and often cause the reader to crash.

I always thought this was an issue with ePUB so I steered clear of ePUB. But after downloading some very large ePUB from Google and Feedbooks I realize it was calibre all along.

Now the LIT, MOBI, LRF files that calibre produces are excellent.
=X=
The quality of the EPUB that calibre produces depends on the quality of the input you give it. There is nothing "bloated" or "slow" about EPUB files created by calibre. Basically calibre's philosophy is to make as few changes to your input HTML as possible, unlike say feedbooks, which insist on allowing only a very small and well defined subset of features in your input HTML.

Last edited by kovidgoyal; 03-27-2009 at 02:08 PM.
kovidgoyal is offline   Reply With Quote
Old 03-27-2009, 02:02 PM   #24
Hadrien
Feedbooks.com Co-Founder
Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.
 
Hadrien's Avatar
 
Posts: 2,263
Karma: 145123
Join Date: Nov 2006
Location: Paris, France
Device: Sony PRS-t-1/350/300/500/505/600/700, Nexus S, iPad
Quote:
Originally Posted by kovidgoyal View Post
1. Why is it catastrophic to distribute EPUB books with the same id?
2. If you specify an ISBN in the metadata, calibre will put that in the EPUB file. Having and additional calibre specific identifier is no problem at all.
The main identifier should be unique since it can be used internally by a reading system to identify books, or for things such as annotations/bookmarks to globally identify books.

Of course multiple identifiers are fine, but:
1. You don't always have an ISBN
2. You're using the calibre id as the package identifier
Hadrien is offline   Reply With Quote
Old 03-27-2009, 02:07 PM   #25
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by Hadrien View Post
The main identifier should be unique since it can be used internally by a reading system to identify books, or for things such as annotations/bookmarks to globally identify books.
The problem with electronic documents is that they are infintely copyable. Any reading ssytem that labors under the delusion that every file it ever encounters will have a unique id is not going to get very far. For example a user may change the file name of an epub file and copy two copies onto his reader. The whole idea of unique ids is aholdover from print publishing that needs to go away. A unique id belongs to a *book* not to a file.


Quote:
Of course multiple identifiers are fine, but:
1. You don't always have an ISBN
If the user specifies an ISBN calibre will always produce it. Or are you suggesting calibre manufacture random ISBNs?

Quote:
2. You're using the calibre id as the package identifier
[/quote]
Again, so what?
kovidgoyal is offline   Reply With Quote
Old 03-27-2009, 02:14 PM   #26
Hadrien
Feedbooks.com Co-Founder
Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.
 
Hadrien's Avatar
 
Posts: 2,263
Karma: 145123
Join Date: Nov 2006
Location: Paris, France
Device: Sony PRS-t-1/350/300/500/505/600/700, Nexus S, iPad
Quote:
Originally Posted by kovidgoyal View Post
The problem with electronic documents is that they are infintely copyable. Any reading ssytem that labors under the delusion that every file it ever encounters will have a unique id is not going to get very far. For example a user may change the file name of an epub file and copy two copies onto his reader. The whole idea of unique ids is aholdover from print publishing that needs to go away. A unique id belongs to a *book* not to a file.

If the user specifies an ISBN calibre will always produce it. Or are you suggesting calibre manufacture random ISBNs?

Again, so what?
You're mixing things up: first of all, unique identifiers are not something from the "print world", we use URI every day to identify resources.
Basically from what I can guess, with Calibre you're using an identifier that you increment every time that a file is generated (1, 2, 3 etc.). Instead you could generate a UUID, which would still work internally with the way you handle books but wouldn't be as likely to confuse a reading system.

Of course you don't have to generate a ISBN, it wouldn't make any sense.
Hadrien is offline   Reply With Quote
Old 03-27-2009, 02:24 PM   #27
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by Hadrien View Post
You're mixing things up: first of all, unique identifiers are not something from the "print world", we use URI every day to identify resources.
Uniquely and permanently? For content that isn't actually on a website? I don't think so. And UUIDs are not unique, they just have a low probability of collision. The whole concept of assigning permanently unique numbers to content that can be infinitely duplicated and modified is deeply flawed.

Quote:
Basically from what I can guess, with Calibre you're using an identifier that you increment every time that a file is generated (1, 2, 3 etc.). Instead you could generate a UUID, which would still work internally with the way you handle books but wouldn't be as likely to confuse a reading system.
Actually, calibre does generate a UUID if you convert from the command line, which is what any large scale conversion service would do anyway.

And my point was that a reading system that relies on every file that it comes across having a unique id is going to look rather silly.
kovidgoyal is offline   Reply With Quote
Old 03-27-2009, 02:29 PM   #28
Hadrien
Feedbooks.com Co-Founder
Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.
 
Hadrien's Avatar
 
Posts: 2,263
Karma: 145123
Join Date: Nov 2006
Location: Paris, France
Device: Sony PRS-t-1/350/300/500/505/600/700, Nexus S, iPad
Quote:
Originally Posted by kovidgoyal View Post
Uniquely and permanently? For content that isn't actually on a website? I don't think so. And UUIDs are not unique, they just have a low probability of collision. The whole concept of assigning permanently unique numbers to content that can be infinitely duplicated and modified is deeply flawed.

Actually, calibre does generate a UUID if you convert from the command line, which is what any large scale conversion service would do anyway.

And my point was that a reading system that relies on every file that it comes across having a unique id is going to look rather silly.
Low probability of collision is actually much better than incrementing an id starting with 1 anyway...

You can use such an identifier if you like, but don't assign it to the package identifier.

On PG they're using the URI for the book's page which is a good choice too.
Hadrien is offline   Reply With Quote
Old 03-27-2009, 02:39 PM   #29
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by Hadrien View Post
Low probability of collision is actually much better than incrementing an id starting with 1 anyway...
I think we're talking past each other here. There are two issues:

1) The philosophical issue of whether trying to assign a unique id to electronic documents is meaningful. On this I strongly believe it is not. A consequence of that is that I believe a statement like "producing books with non unique ids is catastrophic" is just wrong.

2) The practical issue of what id to assign to an EPUB book. In calibre there are two approaches used:

a) If you convert using the GUI. calibre assigns the id of the book in the database. The idea being that these books are meant to be part of your personal collection. In such a context an uniquely incremented number is actually *more* unique than a UUID.
b) If you convert via the command line it uses a UUID.

Now where a bulk conversion service like PG are concerned, it is behavior b) tht is important and there calibre produces a UUID.



Quote:
You can use such an identifier if you like, but don't assign it to the package identifier.
Again, when converted via the commandline the package identifier produced by calibre is a UUID
kovidgoyal is offline   Reply With Quote
Old 03-27-2009, 02:57 PM   #30
Peter Sorotokin
speaking for myself
Peter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it is
 
Posts: 139
Karma: 2166
Join Date: Feb 2008
Location: San Francisco Bay Area
Device: PRS-505
Quote:
Originally Posted by kovidgoyal View Post
1) The philosophical issue of whether trying to assign a unique id to electronic documents is meaningful. On this I strongly believe it is not. A consequence of that is that I believe a statement like "producing books with non unique ids is catastrophic" is just wrong.
Kovid,

while I understand at to some extent share your position on uniqueness of the identifier, the standard clearly requires a globally unique id:

Quote:
2.1: Package Identity
The package element is the root element in an OPF Package Document; all other elements are nested within it.

The package element must specify a value for its unique-identifier attribute. The unique-identifier attribute's value specifies which Dublin Core identifier element, described in Section 2.2.10, provides the package's preferred, or primary, identifier. The OPF Package Document's author is responsible for choosing a primary identifier that is unique to one and only one particular package (i.e., the set of files referenced from the package document's manifest).

Notwithstanding the requirement for uniqueness, Reading Systems must not fail catastrophically if they encounter two distinct packages with the same purportedly unique primary identifier.
There are a lot of silly things in the standards, but if we all start to ignore them, the world entropy is only going to increase...
Peter Sorotokin is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Project Gutenberg Australia ballast Deals and Resources (No Self-Promotion or Affiliate Links) 9 07-31-2010 04:18 PM
Project Gutenberg levi_john Workshop 17 07-26-2010 06:02 PM
How are the mobi and epub files at Project Gutenberg? ficbot General Discussions 2 04-16-2010 06:57 PM
What's wrong with Project Gutenberg? mtravellerh News 13 04-22-2009 03:17 AM


All times are GMT -4. The time now is 03:28 PM.


MobileRead.com is a privately owned, operated and funded community.