![]() |
#46 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,149
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You're making all this waay to complex, and you're putting the cart before the horse. Before worrying about how you're going to convert from your "base" format into other formats, worry about how you're going to convert all existing ebooks in various formats into your base format. Once you have all your books in a format, provided it is reasonably well designed, writing a converter into any other format is the work of perhaps a couple of days.
|
![]() |
![]() |
![]() |
#47 |
Connoisseur
![]() Posts: 88
Karma: 15
Join Date: Nov 2007
Device: still looking for an ebook reader device
|
I disagree.
back to my imaginary file format analogy. Let's say I come up with a "standard" that's either a txt file, or an rtf, or html, or xml file, or doc file, and I put it in a zip file and change the file name. This would be a very easy format to convert to (put anything in a zip file and change the name), but would be a pain in the ass to convert from, and would essentially be useless. You'd never know for sure whether or not you were getting the correct semantic data (which might or might not be stored in an xml version, or might be in html meta data, or might also just be in the first few lines of a text file), and your conversion would be riddled with if statements, essentially just being the sum total all of the separate conversion programs that convert from any other file format. There's no sense in bothering to convert files into ONE file format if it's not actually ONE file format. Likewise, it's inherently not possible for me to write a good converter that would convert from a file type that potentially stored the same semantic data in multiple places. I'd have no way to programatically know which one was correct, when data was found in more than one spot. A file type with such a nature, as far as I'm concerned, throws away information. Maybe epub isn't as bad as my example, but as far as I can tell it does "support" dtbook, xhtml, and xml content in varying confusing ways, as well as having the ability to store the same semantic data in multiple places. so... I'm not just going to choose whatever's easiest to convert to. My priority is to store all the information in the way that's best for getting the information out, not in. If it's difficult to get information in, then that inherently demonstrates the faulty nature of the format I'm trying to convert from, not a fault of the format I'm converting to. |
![]() |
![]() |
Advert | |
|
![]() |
#48 | |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() Posts: 44
Karma: 542
Join Date: Dec 2007
Device: Sony PRS-505
|
nairbv,
Here is the situation, as far as I can tell by reading the specifications, regarding epub: epub is a container format, within that container the text content is encoded as XML, that XML can be in one of two XML dialects, XHTML and DTBook. Normally you would use one or the other to hold the text, not both. CSS is used to describe how the XML is formatted for display. For a reading system to be conformant with the spec it must support XHTML, DTBook and CSS. See this section of the specification: Quote:
If simply supporting XHTML and CSS constitutes an epub reader then the bar is set pretty low. Indeed if we use that definition I imagine most of us already have an epub reader installed (Firefox can display XHTML and CSS for example). So XHTML, DTBook and CSS support is required. Epub also allows other XML dialects to be used but a reading system is not required to support these additional dialects, instead epub provides a fallback method so the publisher can effectively say: If this reading device supports TEI then display this page (which is tagged as TEI) otherwise display this other page (which is tagged as XHTML). For a document to be epub compliant it must provide a fallback (i.e. a page in a format that a compliant reader can parse - XHTML or DTBook) for every piece of non-standard XML used in the document. The idea is that if Sony releases a reader that supports epub + MathML then you can write an epub document to take advantage of that expanded capability but which will also display fine on a different reading system that doesn't understand MathML. Regarding DTBook borrowing from the HTML spec, it does, but only when it makes sense to do so. For instance they both use the paragraph tag, that's because a paragraph is a useful semantic unit when marking up books. There's no point re-inventing the wheel after all. To attempt to answer one of your earlier questions; yes, it is possible to represent the content of a book entirely in DTBook XML. If you want it to be useful to most people then you should probably create a CSS file to go with it so it can be displayed in a visually attractive form. Once you have your DTBook and CSS files it should be a very easy step to package them together into an epub document for viewing on an epub compliant reading system. |
|
![]() |
![]() |
![]() |
#49 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,149
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
nairbv as long as you choose one file format all the metadata will be stored consistently, because you will ensure that while converting to that format. Various formats zipped up is not a single file format, its various file formats zipped up.
So to re-iterate, you will need to write converters from various file formats to your single file format and these converters are going to have to be able to read metadata from all these various file formats as well as converting the content itself. That is the hard part. Writing a converter from some format (even epub) to any other format is trivial by comparison. Now I've already substantially solved this problem in libprs (the only major ebook format I dont read metadata/convert is .mobi and I've chosen to store the metadata not in a file format but in a database and when you export files from the database, the metadata is written to an OPF file. Last edited by kovidgoyal; 01-01-2008 at 12:24 PM. |
![]() |
![]() |
![]() |
#50 |
Connoisseur
![]() ![]() Posts: 98
Karma: 140
Join Date: Jun 2007
Device: sony reader prs-500
|
I have a main library thats is in order (author, genre) but has loads of different formats (lit,doc,pdf ect) and i have a sony reader library thats in order i add books to it as i convert them to lrf,iwould do it that way lot of timeto waste on a book if you not ready to read it
|
![]() |
![]() |
Advert | |
|
![]() |
#51 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Quote:
Dale |
|
![]() |
![]() |
![]() |
#52 |
Connoisseur
![]() Posts: 88
Karma: 15
Join Date: Nov 2007
Device: still looking for an ebook reader device
|
lexicon:
conversion software is essentially reader software,. .. so what you're saying confirms my point. to write a valid conforming from-epub converter, I have to be able to parse at least two different kinds of content files. kovidgoyal: sure, if I write the converter that converts to epub, I can control where I put my metadata, but if I'm storing all my files as epub and I get one that's already epub, I'm not going to have any idea if someone else consistently put the same metadata in every place that metadata could possibly be stored. I might also be inclined to use someone elses conversion software when/if it exists. If I find conflicting metadata in some file, I might not even have a programatic way to guess things like what the correct title is. I'd have to solve this problem in a epub->dtbook converter anyways, but why add to the problem? likewise if I turn a DTBook into an epub book, by essentially putting it in a zip and renaming it, I know *less,* not more about what I have in that zip file, since hence-forth it might be a DTBook and might be XHTML. That seems to only add confusion, so I think I'd rather just keep books in DTBook format if I can get them there. If I decide later that epub is a better format than DTBook, ... conversion from DTBook should be as trivial as a few lines of shell script. since DTBook practically is epub, it shouldn't be significantly more difficult to convert to dtbook than to epub. The only added difficulty is that I'm constrained to converting to one format, instead of sporadically converting to either of two formats. I'd rather convert consistently to one format than either of two. If I have two formats of files and for consistency convert them into something that might be one or might be the other of two other file formats, .... that just sounds like an exercise in futility. |
![]() |
![]() |
![]() |
#53 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,149
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Umm from what I remember of the epub spec, an epub zip file stores metadata in only one place (an opf file) and a well formed epub document should specify whether it contains dtbook or xhtml. In any case that is really easy to detect by just look at the XML headers.
|
![]() |
![]() |
![]() |
#54 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Quote:
Dale |
|
![]() |
![]() |
![]() |
#55 | |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 152
Karma: 854
Join Date: Dec 2007
Device: Lifebook T5010
|
I'm a genius.
Quote:
"Office 2003 Service Pack Disables Older File Formats" Basically, older file formats are removed automatically from Office 2003, unless you go through an unworkable work-around. That's why it is critical to standardize on file formats, and keep your files in only those formats. Andy |
|
![]() |
![]() |
![]() |
#56 |
Connoisseur
![]() Posts: 88
Karma: 15
Join Date: Nov 2007
Device: still looking for an ebook reader device
|
@recycledelectron:
So what's your opinion on epub? ... a file format that is sometimes a zip file containing XHTML, and is sometimes a zip file containing a DTBook? ... and then maybe in addition an "it's preferred if you use this xml document if you know how to parse it" other option? A file format who's rendering will be handled by css when displayed in a web browser, but by an adobe specific file called page-template.xpgt when displayed by the primary currently existing "epub compliant" software. @kovidgoyal: Sure, it *should* go in the opf file. ... but if converted from html, the metadata will probably also be in the html file. if converted from a dtbook, it will probably also be in the dtbook. if converted lazily, which will often enough be the case, it might not have been copied into the opf file. When converting from epub to html, most people will just pull out the html file and think "i'm done," ... and thus ideal epub authoring software would put the data in both places when creating the epub file initially. Often enough, buggy software cuts off some string somewhere at some number of characters, and so even just minor things like sporadic poorly written software will mean that two versions of metadata won't match. *Good* reader software would probably check html and/or dtbook metadata when it fails to find all metadata in the opf file... since, after all, it might be there, and why miss data? I see these as unnecessary complications introduced by a poorly thought out design. I'm just saying that I would prefer a solution that only stores semantic data once. For me, much of the point of moving to a single format is to reduce redundancy. If the file I'm converting to maintains redundancy, then there's no reason for me to bother. |
![]() |
![]() |
![]() |
#57 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 152
Karma: 854
Join Date: Dec 2007
Device: Lifebook T5010
|
I've never heard of it until now, and certainly will not be looking into it until I see thousands of my documents showing up in that format. It's kinda like if you ask me what I think of the claim that Larry Niven is God. I've never considered it, and will not be going on an extended spiritual quest unless I see lots of believers (not a few guys on MobileRead.com) and lots of evidence. No offense.
Ya'll have pointed out a few other great points on a library. 1. Redundancy, unless your files are expendable. 2. Organize them by some method. Title, Author, Subject (my pick,) etc. 3. Fill in the meta data, if you can. I have neglected this, and sincerely regret this now that I have almost 1TB and a Sony PRS-505 that only reads the meta-data title. Author, Title, etc are criitcal. 4. Keep your collection off line. There is always someone who will claim a copyright on a 75 year old book that's been out of print for decades. They will refuse to print it, it will be unavailable from rare book dealers, and you will find yourself sued and your hard disk wiped if you have a copy in your cache somewhere. Andy |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Can a JBL read Fictionwise's multi-format books? | GA Russell | Ectaco jetBook | 16 | 06-01-2010 10:32 PM |
Looking for reading software on Android that will read Epub format | CJBarrow | Reading and Management | 1 | 04-14-2010 03:28 PM |
can we read books from the sony store ( or formerly sony store) and read them in the | SDRebel | Astak EZReader | 27 | 01-22-2010 01:27 AM |
Buuying books on the amazon store to read on Sony prs-505 | Mayr | Sony Reader | 3 | 10-08-2009 03:10 AM |
What format do you like to "Store" your books in? | askyn | Workshop | 11 | 10-16-2008 01:22 PM |