View Full Version : Suppose I wanted to make an epub???


ProDigit
04-11-2012, 02:22 AM
So say I have my document all prepared in a single, basic HTML file, working within-document hyperlinks, titles, and text formatting.

Could I just ZIP the file, and save it as an epub instead of a zip file?

What's the easiest program to convert, without modifying anything of layout, formatting etc?
I'm really not fond of Calibre, so forget that program!
I also think bookdesigner is rather difficult to operate on. I used it a long time ago, but I don't want a 'created by bookdesigner' text at the end of my book!


Is there a program that can directly and efficiently convert HTML to epub?

I don't care if text is formatted in another lettertype or size, but I don't want the program to reconvert pictures, redirect hyperlinks, or start deciding what is a chapter and what isn't.

jhempel24
04-11-2012, 02:29 AM
Sigil can do it, and it's free.

ATDrake
04-11-2012, 02:51 AM
You can actually do it manually, as long as you put the "mimetype" thing as the very first thing in the zip archive (uncompressed) and add the rest of the files in after it (can compress).

See this thread for an example of how (http://www.mobileread.com/forums/showthread.php?t=55681) the two commands to do this look like on a command-line interface, and any decent GUI zip creation app which lets you selectively add and delete files should be able to do it as well.

ETA: You'd have to create yourself an OPF and preferably NCX file as well for the manual approach, so you might prefer Sigil after all.

HarryT
04-11-2012, 02:52 AM
Please post in the correct forum section. You've been here long enough to know that ePub questions belong in the ePub forum!

Toxaris
04-11-2012, 03:19 AM
The easiest way would be fire up Sigil, import the HTML and enter some basic metadata like title and author. Generate a ToC and save.
Oh, and don't forget to push the validate button.

Do remember though that one single HTML file may cause problems on various readers. If the HTML file is larger than 280 KB, than this will be causing problems.

ProDigit
04-11-2012, 03:58 AM
Does only a ZIP program work, or can I also use RAR or 7Z compression?

Can I open an epub file with eg: winzip (or other free program), and replace the files containing the book's text, with my book's text files?

I've opened several epub files and see a lot of useless data there.
Most showing conversion done with calibre, which I see as the new Microsoft, filling up text lines and unnecessary lines of code which only enlarge the book's size.

Some of them defining font types, but I don't really need special font types. The standard font types will do just fine.

Concerning a book's size, you'd recommend keeping it at 128kB or smaller, and dividing large books into chapters instead (eg: have chapter files)?

I'm kind of reverse engineering an epub, to see which data can be eliminated, to still give the same results.

ATDrake
04-11-2012, 04:13 AM
Zip compression only. If your RAR/7z program also happens to be able to create and modify zip files, then you can use it, but ePubs are really just glorified zip archives with the internals arranged in a particular manner.

You'd actually probably have to tweak the existing OPF in another ePub enough for metadata like author, title, etc. that it'd just be easier to create your own OPF. See the Wikipedia example (http://en.wikipedia.org/wiki/EPUB#Open_Packaging_Format_2.0.1) for a version you can quickly copy-paste and substitute your own values into.

Making a simple ePub by hand from scratch with just a text editor and a zip program is actually very easy. Here's the website of someone who has a very easy tutorial (http://www.katiebooks.ca/index.html?ifrm_1=tutorialpage.html)which shows you how to do it in plain simple language. You can skip the part which mentions ePub Check if you're just starting out and seeing how things work before committing to distributing or whatever you plan to do with your book.

ETA: Divide large books into chapters. Some older readers can handle reasonably big files, but only if the internal HTML is split into what it considers manageable "chunks".

ProDigit
04-11-2012, 04:37 AM
Thank you for the links!
I'll get right to reading and working on converting a book I previously published into the epub format!

(makes me feel like I'm contributing to the world).

It seems that Katiebooks recommends this text to be inside the 'container.xml' file:
<?xml version="1.0" encoding="UTF-8" ?>

<container version="1.0"

It seems to me that I can remove these lines of code, while still read the book. should I remove them then, or do you think there's compatibility issues with some devices, if these lines are not inserted?

ATDrake
04-11-2012, 11:56 AM
You absolutely cannot remove the xml declaration or the container top-level tag. It is a vital part of the file which tells the software what kind of file it is so that it can read it properly.

Some software admittedly might have some sort of error correction built in to automatically compensate, but it really is part of the required bits specified in the spec and books that omit it will be regarded as incorrect and might well be ignored by some readers which won't even regard them as working ePubs at all.

Toxaris
04-11-2012, 03:43 PM
You cannot randomly replace files in an ePUB. That will definitively break it. You could try my post, which will do what you need.
If you want to know more about the format, read the Wiki or the Jedisaber site. The format is described there including the required files.

Remember, it is not a regular zip. There are some specifics making it an ePUB.

ProDigit
04-11-2012, 06:20 PM
I'm studying up on CCS and ePub for now.

Just one more question,
Suppose I would put a link in the header and/or footer of every chapter, linking to the next and previous chapter. Is there a way to create a single line in CCS, or should I manually create the link in every chapter, in the HTML code?

CCS Seems to me some kind of code where you can specify the overall text, so you won't have to re-specify this same information within every HTML chapter of the epub.
Now most ebook readers I have, have internal fonts, and internally regulate fontsize, and neither do I find it necessary to specify external websites within an epub, as the devices don't connect to the web anyway, so I don't need to specify all those parameters in CSS, and can just get rid of them.
In fact, leaving it with a clean, stripped to the core, HTML might even be better.
For that reason I was thinking of not using CCS at all, removing any line of code that was not vital to a reading device.

Previously I even removed lines like the one I mentioned above, to save on HTML complexity, but I recoded my files with bookdesigner, and I don't know how BD modified the source HTML (perhaps added that info again when encoding to LRF files without my knowledge).

So, unless if I can use CSS to create some kind of code that will save me time in HTML code, and keeps the overall ePub file clean, I am thinking of getting rid of CSS altogether!

As far as removing the lines of code:
<?xml version="1.0" encoding="UTF-8" ?>

<container version="1.0"

Tell me, is there any other container version out there?
Any reading device I know automatically selects UTF-8 when not specified (even in HTML), and since there is only one container version, I presume each reading device presumes you're using this container version. None of the reading devices is equipped to read version 2.0, so to me it's like a useless line of code; as the device probably will automatically assume it's container version 1, and even if not, it will handle it as one; but further testing on that soon, once I've been able to create a basic ePub file.

I still have an Astak reader Pro, and Ectaco Jetbook Color to test my epub files on (unfortunately I switched my regular Jetbook for a Jetbook Color, so I no longer have that device for testing).



As far as replacing files in an existing epub, so far it has worked, just as long as I manually update the internal files, mentioning any change in chapters, titles, and file names and so on...

ProDigit
04-11-2012, 06:40 PM
Also,in my study I've come across conflicting data.
Some sites say epub accepts only SVG files, others say PNG.
Both are low compression file formats, and in my opinion not the best for mobile books.
.jpg files should be a lot better for reading devices, jpeg2000 for photo realistic pictures, and gif for simplistic, anime-like graphics.

I'm perplexed that ePub originally wasn't made to support these major formats, eventhough (al)most (all) devices support those formats from their OS.

ATDrake
04-11-2012, 08:25 PM
CSS is actually for specifying formatting without cluttering up the source HTML with extraneous markup. You can autogenerate some extra text with it, but it's poorly supported even in web browsers for that and not recommended. Consider doing a navigable table of contents and NCX chapter marks instead.

There is currently no container version 2.0, but the spec is built for forward-compatibility and backward readability thus assumes there someday might be. You may be able to get away with omitting for some software but there will probably be error messages or outright refusals to open in others, such as KindleGen when converting to mobi.

ePub potentially accepts a very wide range of audio/video/image formats just the same as HTML. Again, the limitation is the software and modern readers support jpg and gif. You can actually get better compression/smaller filesizes with png than gif if you use palettes and get the settings right in many cases. SVG is just glorified text and compresses very well; the problem is support.

Hope this helps.

Toxaris
04-12-2012, 04:31 AM
Using headers and footers is not really supported in ePUB. It is in the specs, but all readers ignore it to my knowledge. Also, a link is not required to go to the next chapter or the previous. Just turn the page and you will go to the next chapter. Just make sure that the sequence is correct in the opf file.

You must realize the difference between layout and structure. Not HTML, but XHTML is used in ePUB (hence the <?xml reference). Those quite similar, but there are differences. You should limit you XHTML files to structure only. It should identify what is a paragraph, what is a header and so on. In your stylesheet (CSS) you define the layout. How must a paragraph look like? Font-size, italic, indent, all kind of layout stuff. By doing it in a stylesheet, you can easily reuse that for all your XHTML files.

According to the ePUB specs, JPG, PNG and SVG must be supported. Some have claimed problems with SVG, but so far I haven't. Not all SVG functions (like animation) is supported, but the specs also specify that.
They all have their uses. For photo's JPG is usually the best. For line images, PNG is much better. Smaller and less artifacts. Also if you have text in your image, PNG is usually best.
SVG is a strange beast. Very powerful though. You can use an image converted to (or directly created in) SVG and it will work. I personally use it for things like formula's (scalable!) and captions. There are various ways to incorporate an SVG in an ePUB. Directly as text between <div></div> tags is one option. Another is adding it like a picture and reference to it in a <img> tag. In SVG it is also possible to display another picture. The resizing-quality is rather good then and it gives the possibility to create a caption to the picture and keep them together on a page.

ProDigit
04-12-2012, 05:27 AM
Thanks for explaining so far!
I can't help but think that epub code is far from optimized, much like how MS office creates HTML pages, with lots of things that aren't really necessary.

Suppose I just want to display my book, do titles in bold and larger letter type than the regular text; but I don't really care about what lettertype or fontsize it will be,
can I get rid of CSS, and just simply compress HTML files in an epub?

I mean, I've read many manuals concerning creating an epub, but I've never seen one that keeps itself to the basics of basics.
Suppose I don't want an ID reference number, and I don't want external sites within my ebook,
What would be an utmost minimalistic epub (eg: one with no extra lines of code, one that just displays a cover photo, a toc, and one or two chapters)?

Most of the manuals online add too much data in their epub. Data that I find is not really necessary. Like for those people that load their books from a file browser structure, adding a title within an epub book more than once is not really necessary.
In fact, I see no reason to write a book title within an epub, if it's already mentioned in the filename.

From point of writing a very minimalistic epub, I find that the epub format has way too much garbage in it, that is unneccessary for any user to read.
It may be good for having databases, and organizing stuff automatically, or converting books, but to read, all you need is a basic HTML0 book, I'd presume?

AlexBell
04-12-2012, 05:28 AM
I'm studying up on CCS and ePub for now.


You might like to have a look at EPUB Straight to the Point by Elizabeth Castro; I think it's an excellent resource, especially if you're interested in InDesign and iBooks. She also wrote the excellent HTML XHTML & CSS, and the books fit very well together.

Toxaris
04-12-2012, 09:04 AM
Thanks for explaining so far!
I can't help but think that epub code is far from optimized, much like how MS office creates HTML pages, with lots of things that aren't really necessary.
No, not really. All things are necessary. In essence you need the file structure (folders not required), the mimetype, the opf and ncx and one or more XHTML files.

Suppose I just want to display my book, do titles in bold and larger letter type than the regular text; but I don't really care about what lettertype or fontsize it will be,
can I get rid of CSS, and just simply compress HTML files in an epub?
You don't NEED CSS, it only makes it easier, better structured and maintainable. If you are fine with the standard layout interpretation of the reader, you don't need it. Also, don't put stuff in you stylesheet you don't need. Keep it simple.

Suppose I don't want an ID reference number, and I don't want external sites within my ebook,
What would be an utmost minimalistic epub (eg: one with no extra lines of code, one that just displays a cover photo, a toc, and one or two chapters)?
Certain things are required. An unique UID, the files mentioned earlier and some metadata. The title, author and language are mandatory. I would say you need one HTML for the cover, a TOC is included if you maintain your toc.ncx file correctly and a HTML for each chapter.

Most of the manuals online add too much data in their epub. Data that I find is not really necessary. Like for those people that load their books from a file browser structure, adding a title within an epub book more than once is not really necessary.
In fact, I see no reason to write a book title within an epub, if it's already mentioned in the filename.
The title tag within a HMTL is not used. The tag must be there (part of the XHTML spec), but does not to be filled in. The title metadata in the OPF is required by the ePUB specs.
Frankly, what you think is important but if it is part of the requirements of the specs, you need it anyway.

From point of writing a very minimalistic epub, I find that the epub format has way too much garbage in it, that is unneccessary for any user to read.
It may be good for having databases, and organizing stuff automatically, or converting books, but to read, all you need is a basic HTML0 book, I'd presume?

What do you consider to be garbage? Can you give examples? Metadata is hardly garbage, as it is helpful to identify a book. Remember, readers don't use filenames but the filenames. If you don't want all that, don't create an ePUB, but a HTML book.
Again, read about the format on the Jedisaber site. It describes all the really required files.

DaleDe
04-12-2012, 01:07 PM
It is not necessary to deal directly with ePub if you don't want to. For simple ePub you can simply use Open Office writer and add the extension writer2epub available in a forum here at MobileRead and create your epub practically automatically. If you want to buy a program Atlantis can also direct save in ePub format.

Dale

Freeshadow
04-12-2012, 02:32 PM
I'm not sure you can use 7zip because it (afair) doesn't support mixing stored and compressed files in a zip.
According to the specs the mimetype file inside of epub has to remain uncompressed.

DaleDe
04-12-2012, 03:46 PM
I'm not sure you can use 7zip because it (afair) doesn't support mixing stored and compressed files in a zip.
According to the specs the mimetype file inside of epub has to remain uncompressed.

It will mix but not in the same step. I use 7zip all the time to open ePub files and it does a fine job as it doesn't care what the extension is. Even with zip you need to use two steps. Make the mimetype first to ensure it is uncompressed and the first entry in the file. Both are important. And then add the other files.

Dale

Freeshadow
04-12-2012, 04:49 PM
If so, should 7zip not be the recommend tool to use?
Don't know where, but I'm sure I read, 7zips zip implementation provides smaller archives than other zip making archivers (zlib based ones and and winzip were mentioned) while still having no compat issues with decompression (yes I mean zip not 7z archive files)

DaleDe
04-12-2012, 05:00 PM
If so, should 7zip not be the recommend tool to use?
Don't know where, but I'm sure I read, 7zips zip implementation provides smaller archives than other zip making archivers (zlib based ones and and winzip were mentioned) while still having no compat issues with decompression (yes I mean zip not 7z archive files)

I use 7zip a lot. I like that it installs as a right click on my menu which makes it handy. I also use info zip from the command line. However, I seldom if ever make an ePub that way although I do use them to inspect and modify ePubs. For making an ePub I typically use Sigil but there are lots of tools that can make an ePub easily. Our wiki has a list.

Dale

mod186k1
04-12-2012, 05:22 PM
In the Open Container Format (OCF) 3.0 (http://idpf.org/epub/30/spec/epub30-ocf.html) They talk specifically of ZIP: no other format is supported (nor 7z or rar):

...
The OCF specification defines the rules for structuring the file collection in the abstract: the "abstract container". It also defines the rules for the representation of this abstract container within a ZIP archive: the "physical container". The rules for ZIP physical containers build upon the ZIP technologies used by ODF....


I suppose they choose zip because of its diffusion in the world (even in most library and operating system)

OCF is one of the standard ruling ePub format

Freeshadow
04-12-2012, 05:44 PM
It even is on its wkipedia page:
When compressing ZIP or gzip files, 7-Zip uses its own DEFLATE encoder, which is often able to achieve higher compression levels, but at lower speed, than the more common DEFLATE implementation of zlib. The 7-Zip deflate encoder implementation is available separately as part of the AdvanceCOMP suite of tools.

JSWolf
04-12-2012, 05:57 PM
Consider doing a navigable table of contents and NCX chapter marks instead.

Actually, the navigable ToC is a waste of time. Just make the ToC using toc.ncx. It's a lot easier and a lot better for ePub.

ProDigit
04-12-2012, 06:10 PM
7Z can compress in several ways,
You can set the compression level from 'store' gradually in 5 steps to 'ultra'.
It still uses a standard zip encoder.
You can set word size, etc.. But I don't know if it'll affect epubs.

So far I've opened existing epubs with 7z, and added/updated some existing files within these epubs.
I don't think 7z supports compressing one file as store, while the others as a compressed format. it probably will want to compress all files (including mimetype) or compress none, depending on how you set the compression level.
For that reason I open and just update an existing epub rather than create a new one.

ProDigit
04-12-2012, 06:12 PM
Actually, the navigable ToC is a waste of time. Just make the ToC using toc.ncx. It's a lot easier and a lot better for ePub.

I was thinking about using a toc like this.
However if I create a toc, based on HTML code (just display it as a page at the beginning like a chapter), I could save a lot of code!
HTML toc's only take up 2 lines per hyperlink, in this toc.ncx file, a hyperlink takes up 3 lines.

I find no reason why I should define an HTML, and then link to the definition, instead of just link to the html file?

Elfwreck
04-12-2012, 06:35 PM
Actually, the navigable ToC is a waste of time. Just make the ToC using toc.ncx. It's a lot easier and a lot better for ePub.

Whether a navigable TOC is useful depends on your audience. You know the navigable TOC is going to be find-able in the book; whether the toc.ncx settings get used depends on the reader's awareness of the device's option. Not everyone wants to punish readers who are fairly clueless about the tech they're using, or who are borrowing a friend's device and have no idea where the TOC settings are.

However, I grant that for most linear texts, TOCs are pretty much a distraction.

Freeshadow
04-12-2012, 06:42 PM
Prodigit you're wrong
When making a 7z archive you don't use the same method. As applied to zip archives (deflate algorithm)

What you want to do is try the AdvanceCOMP package.
This way you repack deflate streams (of any zip file) by using 7zips deflate implementation.

DaleDe
04-12-2012, 06:45 PM
I was thinking about using a toc like this.
However if I create a toc, based on HTML code (just display it as a page at the beginning like a chapter), I could save a lot of code!
HTML toc's only take up 2 lines per hyperlink, in this toc.ncx file, a hyperlink takes up 3 lines.

I find no reason why I should define an HTML, and then link to the definition, instead of just link to the html file?

There is a reason. While I disagree that an inline TOC is not useful, the TOC.NCX file is a must IMHO. I often include both particularly if the inline TOC adds value such as providing additional text that was in the original paper text version or provides an overview that cannot be readily seen in the separate file.

Saving a few bytes is a poor reason to leave out the toc.ncx file. An ePub features it to be able to bring up the TOC at any time while reading a document. This provides for a ready ability to be able to traverse the document as needed by the reader. This is the dominant reason for adding it. It is not, as you presume just a link to the definition. It is the only method defined by the standard for traversing the document by chapters. I believe a file is required even it it has only a single entry.

Dale

ProDigit
04-12-2012, 10:18 PM
Saving a few bytes is a poor reason to leave out the toc.ncx file. An ePub features it to be able to bring up the TOC at any time while reading a document. This provides for a ready ability to be able to traverse the document as needed by the reader. This is the dominant reason for adding it. It is not, as you presume just a link to the definition. It is the only method defined by the standard for traversing the document by chapters.
Dale

I beg to differ, although I don't have much knowledge of epub yet,it seems to me that an in-document HTML could also contain a toc; and though most readers are able to access the toc.ncx from anywhere within a book,I don't know if ALL readers would support this (they may be compatible with it, but not offer any special toc menus subtracted from the toc.ncx file.


In actuality, I'm planning to create yet another big project, for which saving every bit of memory is going to be needed. Unnecessary references to links get loaded and oft remain in the device's memory.
In a 250kB book this does not make much difference, but in creating a bible with structure, probably could...

I'm planning on creating a bible. If you look up my "bible framework" on mobile read, you'll notice it has well over 3.000 links in it.

I made a KJV bible a few years ago for the Sony Reader (in LRF format).
I made the bible entirely out of a single HTML file. Every book included contains links to chapters, and each chapter contains a link to the toc and previous and next chapter.
The bible I created for the sony, would not function properly on the sony (although it would on a computer), probably because of lack of memory, because of too many hyperlinks.

For that reason I'm keen on trimming as much as possible.
If things go the way I suppose they go, I could choose to get the books defined in toc.ncx, and use in-document hyperlinks.
If I would create every chapter link in the toc.ncx file, that file would well exceed 200kB, and I don't know how any reader would handle that...

ProDigit
04-12-2012, 10:22 PM
Prodigit you're wrong
When making a 7z archive you don't use the same method. As applied to zip archives (deflate algorithm)

What you want to do is try the AdvanceCOMP package.
This way you repack deflate streams (of any zip file) by using 7zips deflate implementation.

I don't have any advanceCOMP package, and yet the ebooks I tested worked flawlessly. I repacked an existing epub;I kept only the 20byte file and directories untouched, but changed the text of a book and cover with 7z by just adding or updating those files in the archive.
So far no problems yet....

I don't know if 7z did any better compression than the original ebook. It may have used the same algorythm as used in the original archive...

pholy
04-12-2012, 10:29 PM
Saving a few bytes is a poor reason to leave out the toc.ncx file.

I just went back to the OPF spec (http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm):

1.4.1.2: Publication Conformance
...
vii. the required NCX, encoded in either UTF-8 or UTF-16, is included; and
...


bolding is in the original

So if you leave it out, you don't have an epub document. It's that simple.

ProDigit
04-12-2012, 10:44 PM
still, you can include the ncx file, but leave it virtually empty?

JSWolf
04-12-2012, 11:17 PM
I was thinking about using a toc like this.
However if I create a toc, based on HTML code (just display it as a page at the beginning like a chapter), I could save a lot of code!
HTML toc's only take up 2 lines per hyperlink, in this toc.ncx file, a hyperlink takes up 3 lines.

I find no reason why I should define an HTML, and then link to the definition, instead of just link to the html file?

It's not 2 lines vs 3 lines that should be the deciding factor. In all readers that use ADE, there is a very easy way to get to the ToC that uses toc.ncx. There is no easy way to get to an ToC that's just an XML file made up of links. It's akin to using the ToC link in a properly made Mobi file.

The way to get to the ToC in an ePub uses toc.ncx.

JSWolf
04-12-2012, 11:24 PM
There is a program you can use with Windows to properly compress an ePub. It's called ePubPack. The link below is to the thread that contains the link to download the program.

http://www.mobileread.com/forums/showthread.php?t=159724

JSWolf
04-12-2012, 11:25 PM
still, you can include the ncx file, but leave it virtually empty?

You could, but it's poor form. You then take away the reader's way of easily getting to the ToC.

pholy
04-12-2012, 11:48 PM
still, you can include the ncx file, but leave it virtually empty?

If you are charging money for this file, I want mine back. :rolleyes:

ATDrake
04-12-2012, 11:58 PM
The bible I created for the sony, would not function properly on the sony (although it would on a computer), probably because of lack of memory, because of too many hyperlinks.

It's not the number of hyperlinks which is the problem with the Sony, but having all the book text in a single file, which is what gets loaded into memory all at once.

This is a known problem with some Sonys (and probably other models of reader), which is why the recommended course of action is to split the HTML into separate files, so the reader can load things a chunk at a time, usually based on chapters (and this may have to do with why Calibre often auto-splits stuff as well).

If I would create every chapter link in the toc.ncx file, that file would well exceed 200kB, and I don't know how any reader would handle that...

Well, it seemed to work well enough for this guy (http://www.mobileread.com/forums/showthread.php?t=103528), who got up to 4080 NCX entries for the TOC for his bible commentary thing until KindleGen choked on converting it to Mobi. Apparently the newer version fixes that limitation.

Toxaris
04-13-2012, 01:35 AM
The limitation on Sony is for HTML files only afak. You must make sure that each HTML is not larger than 280 KB uncompressed.

ProDigit
04-13-2012, 07:41 AM
Well, it seemed to work well enough for this guy (http://www.mobileread.com/forums/showthread.php?t=103528), who got up to 4080 NCX entries for the TOC for his bible commentary thing until KindleGen choked on converting it to Mobi. Apparently the newer version fixes that limitation.


I would think it's a hardware limitation, rather than a software, especially for reader devices which often have very little ram...

ProDigit
04-13-2012, 07:44 AM
The limitation on Sony is for HTML files only afak. You must make sure that each HTML is not larger than 280 KB uncompressed.

The bible version of the NT is 1,4MB and works fine on my PRS-505.
The OT is around 4MB and only has some html hiccups.

HarryT
04-13-2012, 07:48 AM
The bible version of the NT is 1,4MB and works fine on my PRS-505.
The OT is around 4MB and only has some html hiccups.

Then they're split into separate "flows (ie files) within the ePub. As has been said, the maximum flow size that the 505 will work with is around about 280kb.

ProDigit
04-13-2012, 09:49 AM
Ow, is there a difference between epub and lrf in this filesize limit?
The book is in lrf format...

HarryT
04-13-2012, 09:53 AM
Ow, is there a difference between epub and lrf in this filesize limit?
The book is in lrf format...

The limit is for ePub only. Not LRF. It's not a limit on the overall size of the book, but on the size of the individual HTML files within the ePub.

ProDigit
04-13-2012, 02:57 PM
I see,
Does that include pictures, or not?

While the HTML's may be software limited to this file size, I'm sure there are files hardware limited too. Some reading devices may not have enough memory to load up certain files, while maybe future ebook readers may be equipped with enough memory, to load up 10MB HTML files within a book.

I actually fail to understand the filesize limit for HTML's, as 280kB is quite a bit for text, but nothing for text/pictures and hyperlinks; and most reading devices have at least double if not over 32 times the memory to load such a file in their direct memory (like ram or flash)!

DaleDe
04-13-2012, 03:30 PM
I see,
Does that include pictures, or not?

While the HTML's may be software limited to this file size, I'm sure there are files hardware limited too. Some reading devices may not have enough memory to load up certain files, while maybe future ebook readers may be equipped with enough memory, to load up 10MB HTML files within a book.

I actually fail to understand the filesize limit for HTML's, as 280kB is quite a bit for text, but nothing for text/pictures and hyperlinks; and most reading devices have at least double if not over 32 times the memory to load such a file in their direct memory (like ram or flash)!

This does not include pictures. It is due to the file being loaded into ram memory all at once. Pictures are in separate files and only loaded when they are referenced from the text.

Dale

Elfwreck
04-13-2012, 04:22 PM
The limit is about 40,000 words per HTML file, if there's no complex coding involved. (Some fanfic has more than 40,000 word chapters; while AO3 has recently fixed their ebook exports for extra-long chapters, at first they put 1 chapter = 1 html file, which froze up on my Sony. A bit of trial-and-error showed where the breakpoint needed to be.)

JSWolf
04-13-2012, 10:36 PM
One of the reasons for this limitation is that Mobile ADE was first installed on a PRS-505. The choice was so that ADE could run with lesser hardware and it does as Sony was able to put a version of ADE on the 500.

ProDigit
04-14-2012, 04:20 AM
Thank you for your valuable input guys!

I'm a little taken aback with the filesize limitation.

For the new testament,I managed to encode books upto 137kB per html, per bible book (that is without hyperlinks);but I suppose not to have any direct issues with that for now.

For the old testament, I surely will have problems with Psalms,as it has 152 books, and several hundreds of chapters.
In fact, I presume the pentateuch (first five books), will also surpass the 280kB size limit per book, which means splitting...
I don't mind splitting, just as long as it will be invisible to the end user (reading on his device).

The 280kB file limit (or 40k words) you mentioned, is it a software or hardware limitation?

Thank you.

Elfwreck
04-14-2012, 04:27 AM
Firmware limitation for most of the ereaders. (Software, but not easily tweakable by the end user.) Sony readers can't read ePubs with larger than ~280kb HTML files; my PEZ froze up on some of them as well. No idea if Nooks can deal with them or not. Also no idea about phones, but for those, each PDF reading software would need to be checked.

The splitting is invisible to the end user, but most people who hand-code them put the splits where you'd put page breaks--start of a chapter or other major section.

ProDigit
04-14-2012, 05:11 AM
Thank you for clarifying this!


Just an update on my progress:
I've ran into issues while making epubs, seemingly 7z does not work updating every epub.
I'll listen to the advise given to compress/make epubs with some pack, as some readers can read 7z compressed epubs, and others can't. So I'm reading up on that now...
So far the book I'm making is ready in HTML, hyperlinks need to be added but I wanted to do this after I know more about epub itself, and what makes sense and what does not adding as code within the html.

ProDigit
04-14-2012, 05:22 AM
Hmm, yes...

What program should I use to compress a complete new epub?
I'm not too fond of command line programs.. Is there anything out there, free, that looks like 7z or winrar, but compresses the files as an epub (not compressing mimetype)?

Also, concerning the compression mentode, I am sure it'll be compressed less well than winrar or 7z, as they can use LZMA compression method, while epub seems to be limited to the aged zip / bzip2 compressor.

I looked for advanced Comp package, but all I could find is a debian package.
I'm running windows xp.

ProDigit
04-14-2012, 08:20 AM
Ok,one of you guys is incorrect,
280kB equals 280.000 characters, 40k characters means a max filesize of 40kB;in which case I'm in trouble...

HarryT
04-14-2012, 08:21 AM
Please read posts more carefully before criticising. It clearly says "40,000 words", not characters.

ProDigit
04-14-2012, 08:51 AM
Words is very flexible. 40k 2letter words is a lot smaller than spell breakers.

Well, I'm testing some things on my reader, but still can't figure out why the title page works perfect, but when I replace the HTML file of the first chapter within the epub, with another html (which I name the same, and give exactly the same <head>, just a different text in the body), the page does not display.
It shows the amount of pages, but the screen is blank, and when I press next page, it says it's an illegal action....

I'm just trying to find out where the error is in my epub, no offense meant...

Toxaris
04-14-2012, 09:36 AM
If you want to tinker with ePUB's like you do, stuff is bound to get broken. In any case, keep the validation tools close. Use the tools FlightCrew and EpubCheck to see if the ePUB is still structurally valid. It will tell you what is wrong with it if they think there is a problem.

Elfwreck
04-14-2012, 10:40 AM
Words is very flexible. 40k 2letter words is a lot smaller than spell breakers.

40k average words. If you have lots and lots of very short words, the word count will be higher. (It's often easier to get a word count than a character count for a given text.) The exact character count can be calculated--probably should be, if you want to put in lots of links & maybe other coding--but I haven't done that.

A quick check turns up that ~40,000 words is ~215,000 characters, including spaces. You'll have to experiment to sort out a more exact number.

HarryT
04-14-2012, 11:23 AM
40k average words. If you have lots and lots of very short words, the word count will be higher. (It's often easier to get a word count than a character count for a given text.) The exact character count can be calculated--probably should be, if you want to put in lots of links & maybe other coding--but I haven't done that.

A quick check turns up that ~40,000 words is ~215,000 characters, including spaces. You'll have to experiment to sort out a more exact number.

The actual limit is the amount of memory available to store the XML "parse tree", so file size limits are only going to be approximate anyway. Really the best we can do is to say that if you restrict the size of a flow to a 280k file, or to 40k words, you should be safe enough.

Elfwreck
04-14-2012, 11:50 AM
The actual limit is the amount of memory available to store the XML "parse tree", so file size limits are only going to be approximate anyway. Really the best we can do is to say that if you restrict the size of a flow to a 280k file, or to 40k words, you should be safe enough.

Character count will be relevant if there are extensive links or other coding. If every paragraph has a link, it'll be a lot less than 40k words; most word-count systems will assume <a href="../Text/Lamentations05.xhtml#Verse15"> is two words, and the anchor point is one word--unless the program reads it as attached to the word itself.

An anchor point & potential links to every verse is a lot of characters that don't count as extra words to most programs. But an estimate like "200k characters and then wrap it up" would still work.

ProDigit
04-14-2012, 03:15 PM
Concerning epubpack,are there ways to increase compression on the ebook, or is there only one standarized compression?

DaleDe
04-14-2012, 03:52 PM
Concerning epubpack,are there ways to increase compression on the ebook, or is there only one standarized compression?

You can use whatever level of compression zip can give you. You cannot use anything other than zip compression of ePub. You can also vary the images as needed, either by making them smaller, reducing color content, converting to monochrome, or reducing the quality of JPG. All of these things will change the final size of the file.

Dale

JSWolf
04-15-2012, 12:04 PM
But even if you compress the ePub more, you still won't be able to get more per flow.

ProDigit
04-15-2012, 12:55 PM
I understand.
I just wanted to see if it would have any effect on hardware ebook reading devices (perhaps incompatibility?), or if one can make a book even smaller in overall size without downside.

Unfortunately the open community concerning the engineering of these devices is very small. It would be nice if someone had some lab tools to test to see if battery life of devices is affected with a highly compressed VS low compressed ebook.
There must be some sort of "perfect solution" (for lack of a better word in English) that one can have the best battery life and best compression in one package.

The unfortunate part is that power consumption already is so low, and it's very hard to actually measure power consumption on ebook reading.

JSWolf
04-15-2012, 02:34 PM
Why not test it on your computer? Does it take longer to uncompress a ZIP file that has the best compression then a ZIP file that has normal compression? If the answer is yes, then it will also take more processor to do the uncompression and that equates to using more battery.

ProDigit
04-15-2012, 02:42 PM
Then again, if you reduce the filesize (compression to store), the device needs to activate the external SD card more to read data.(= lower battery life)

Do you think epubs are extracted on the fly (parts of them are extracted while you read/flip pages), or are they extracted completely as you open the book?

Jellby
04-15-2012, 03:00 PM
I believe the ZIP compression, unlike MOBI, is not random-access, that means that the contents must be uncompressed first before processing. It doesn't mean, however, that the full ZIP must be uncompressed, just the particular file (flow, chapter) that is being used, that's why some readers have size restrictions on this. Depending on the available memory, a given device might be able of storing the whole uncompressed book in memory, and so it wouldn't need to access the original epub file anymore, but other devices (or with larger books) may need to access again the compressed file every time you change flow (jump chapters, follow hyperlinks, etc.).

theducks
04-15-2012, 09:22 PM
I understand.
I just wanted to see if it would have any effect on hardware ebook reading devices (perhaps incompatibility?), or if one can make a book even smaller in overall size without downside.

Unfortunately the open community concerning the engineering of these devices is very small. It would be nice if someone had some lab tools to test to see if battery life of devices is affected with a highly compressed VS low compressed ebook.
There must be some sort of "perfect solution" (for lack of a better word in English) that one can have the best battery life and best compression in one package.

The unfortunate part is that power consumption already is so low, and it's very hard to actually measure power consumption on ebook reading.

Decompressing, 'higher compression' takes more CPU crunch power (that many devices don't have) and leads to increased battery consumption :eek: and slowness :eek::eek:

ProDigit
04-16-2012, 01:36 AM
If what Jellby says is true, then it's actually beneficial to compress html books in chunks closer to 1x 280kB, instead of by chapter (having eg: 40-14 chunks of 7-20kB)

mod186k1
04-16-2012, 05:39 AM
For that reason Calibre has an option (in converting to ePub format) to split output xhtml files so they will be of lesser size than specified value (default 260KB)

85327

Jellby
04-16-2012, 07:10 AM
I think ProDigit suggests the opposite: "compressing" several chapters together.

Toxaris
04-16-2012, 07:57 AM
I would not be beneficial for speed. Turning pages or go to the next chapter will actually be a lot faster for small chapters than for large chapters.

ProDigit
04-16-2012, 08:40 AM
I would not be beneficial for speed. Turning pages or go to the next chapter will actually be a lot faster for small chapters than for large chapters.

I don't know about that...
If I understand it right, It may take a little longer for a device to find a chapter in hundreds of chapters,compared to one in tens. Though the loading time of that chapter may be a bit longer, like you say, but the loading time of consecutive chapters within that chunk should be less.

Meaning, in case of a large book, read in a linear line, one chapter after the other until the end of the book, is better to have larger files containing more chapters within a (~280kB) html; using in-document reference points to chapters.

In case of a bible, concordance, or dictionary, small chapters should be preferred for the sake of faster loading time, and quicker able to find chapters; using the reference point of the beginning of an html page.

I'm torn between using the toc.ncx or a selfmade HTML toc.
The toc.ncx is fast, easy, but adds some code to the book. The HTML might be a little slower to browse around, can be made to look nicer (eg: in 2 columns, or at least away from the stock TOC layout, or something).
But my biggest concern is how an ebook will handle once the TOC.ncx becomes very large (in case of a bible there are over 1100 chapters; a concordance even more,and a dictionary could have as much as 500.000 links (that is, if you want to reference each word)).
In these cases it does make sense to start trimming on the toc file; or have very efficient code!

Jellby
04-16-2012, 08:50 AM
Meaning, in case of a large book, read in a linear line, one chapter after the other until the end of the book, is better to have larger files containing more chapters within a (~280kB) html; using in-document reference points to chapters.

It seems some readers actually "typeset" the whole file before displaying it. That means there would be less reading and uncompressing, but more processing time to get the text available.

ProDigit
04-16-2012, 09:19 AM
It seems some readers actually "typeset" the whole file before displaying it. That means there would be less reading and uncompressing, but more processing time to get the text available.

Yes I've seen that behavior, but mainly on FB2 books, not epub yet...

Jellby
04-16-2012, 10:12 AM
I'd say ADE does it. At least, when you follow links, they are always in the same location in the page, that is, as long as you don't change font size, the "pages" are fixed, so they must have been typeset in advance.

huebi
04-16-2012, 10:22 AM
Are you talking about page numbering in ADE? Thats a quit simple algorithm: the size of the packed(!) (x)html file is divided by 1024, and thises chunks are called a page.

Elfwreck
04-16-2012, 12:14 PM
Meaning, in case of a large book, read in a linear line, one chapter after the other until the end of the book, is better to have larger files containing more chapters within a (~280kB) html; using in-document reference points to chapters.

In a large book, it's better to have short chapters. Devices can get slow turning pages near the end of the ~280kb limit and don't speed up again until the invisible shift to a new internal HTML file. Navigating to the top of a new HTML file is just as fast, possibly faster, than navigating to the middle of a large HTML file.

The whole point of multiple chapters is that the extra ones don't get in the way of what you're doing *now*. If there are 150 chapters, or 1500, it doesn't matter... all that matters is the one that's the target of the current link, whether that's in the TOC or the result of hitting the "next page" button.

I'm torn between using the toc.ncx or a selfmade HTML toc.
The toc.ncx is fast, easy, but adds some code to the book. The HTML might be a little slower to browse around, can be made to look nicer (eg: in 2 columns, or at least away from the stock TOC layout, or something).

Two COLUMNS? What device do you expect people to read this on?
The toc.nxc file is not arranged in any layout format; how it shows up is built into the software. In epubreader for Firefox, the toc.ncx listings show up as links in a column on the left-hand side of the page. In the Sony reader, it's under the "Table of Contents" internal menu--and jumping to it doesn't lose your page.

An HTML TOC means losing your last-page setting to jump to the TOC. (Which might be irrelevant, since if you're going to the TOC, it's pretty much to change your page--but if you can't figure out which page to visit and want to go back, you can't if you've navigated to an inline HTML TOC.)

But my biggest concern is how an ebook will handle once the TOC.ncx becomes very large (in case of a bible there are over 1100 chapters; a concordance even more,and a dictionary could have as much as 500.000 links (that is, if you want to reference each word)).
In these cases it does make sense to start trimming on the toc file; or have very efficient code!

It's a good question; I've no idea how the toc.ncx works if it gets too large, nor what "too large" means. It's possible that could be tested by auto-generating a couple-thousand XHTML files of individual bible verses, with <h1>Book Chap#: Verse#</h1> followed by <p>Text of verse</p> as their only visible content (they'd need a bit more than that in the XHTML files), and then importing them all to Sigil, click auto-generate TOC, and find out what happens.

HarryT
04-16-2012, 12:39 PM
I'd say ADE does it. At least, when you follow links, they are always in the same location in the page, that is, as long as you don't change font size, the "pages" are fixed, so they must have been typeset in advance.

As you rightly say, this is one of the visible differences between ePub and Mobi. When you follow a link in a Mobi book, the jump destination always ends up at the top of the page. In an ePub book under ADE, the jump destination is "preserved" across jumps.

ProDigit
04-16-2012, 12:49 PM
It's a good question; I've no idea how the toc.ncx works if it gets too large, nor what "too large" means. It's possible that could be tested by auto-generating a couple-thousand XHTML files of individual bible verses, with <h1>Book Chap#: Verse#</h1> followed by <p>Text of verse</p> as their only visible content (they'd need a bit more than that in the XHTML files), and then importing them all to Sigil, click auto-generate TOC, and find out what happens.

Something like this?:
http://www.mobileread.com/forums/showpost.php?p=313760&postcount=1

Jellby
04-16-2012, 01:51 PM
Are you talking about page numbering in ADE? Thats a quit simple algorithm: the size of the packed(!) (x)html file is divided by 1024, and thises chunks are called a page.

No, I'm talking about screens.

Say you have a long chapter, with two anchors inside. You read the chapter page by page, normally, and see the first anchor on top of one page (screen) continue reading, see the second anchor in the middle of another page (screen).

Now you follow some hyperlinks that lead you to the anchors, and you get the first anchor on top of the page, and the second anchor in the middle of the page, exactly like the first time. This means that the software already "knows" about the whole chapter, and has done some kind of pagination on the whole chapter.

ProDigit
04-16-2012, 02:11 PM
sometimes you flip a page, and see no page flip in the page counter.
When you flip the second page, the page flips by two.
I believe that those page counters are an approximation.

After all, if you fill the whole page with letters 'iiiiiiiiiiiiiiiiiii' you'll be able to fit more characters on a page, than when filling them with capital W or M.

Jellby
04-16-2012, 02:31 PM
That's the page numbers huebi referred to. It's completely unrelated to the actual number of times you have to flip a page, it's only (marginally) useful to define a location in a book.

When I talk about pages I refer to a screenful of text, to what would be a page if the reader were a book.

Freeshadow
04-16-2012, 10:18 PM
You can use whatever level of compression zip can give you. You cannot use anything other than zip compression of ePub. You can also vary the images as needed, either by making them smaller, reducing color content, converting to monochrome, or reducing the quality of JPG. All of these things will change the final size of the file.


Dale

Exactly. But what CAN be done is creating a zip file with a more efficient DEFLATE method. Making the zip smaller.
The tools I mentioned before do it.

DaleDe
04-17-2012, 02:13 AM
Yes I've seen that behavior, but mainly on FB2 books, not epub yet...

On ePub a file is one of the html files inside. I likely sets that all at once but not the whole book. this is why endnotes at the end of a chapter (one file) is much faster than at the end of a book. It is likely typeset but images are pulled in later so screen pagination is not set although ADE has its pages set.

Dale

huebi
04-17-2012, 04:51 AM
After all, if you fill the whole page with letters 'iiiiiiiiiiiiiiiiiii' you'll be able to fit more characters on a page, than when filling them with capital W or M.

Well, that depends of the definition of the char. ITs no problem to make an i broader then a W, and in fact, in monospaced fonts all chars do have the same width.

ProDigit
04-17-2012, 09:25 AM
On ePub a file is one of the html files inside. I likely sets that all at once but not the whole book. this is why endnotes at the end of a chapter (one file) is much faster than at the end of a book. It is likely typeset but images are pulled in later so screen pagination is not set although ADE has its pages set.

Dale

Very interesting,
I figured out in ePub or zip, there is no such thing as a solid archive, like in winrar and 7z.

If there where, it would have been impossible to extract one file or one htm.

Does anyone have a modern version of winzip that can test out if it has solid archive capabilities?
I also wonder why they didn't choose rar or 7z.
Rar continuously proved better in compression to zip since the late '90's, and I've been using 7z since 2005 I believe, as it surpassed rar.

is like creating an mp3 player that only accepts wav format. Even today, an mp3 already is outdated.

Elfwreck
04-17-2012, 11:27 AM
I also wonder why they didn't choose rar or 7z.

Winrar is intended to be Windows-only. No idea why not 7z, except that it's a less universal system. Many programs can do zip; only 7zip can do 7z files. They went with the most common & accessible type of compression.

is like creating an mp3 player that only accepts wav format. Even today, an mp3 already is outdated.

No, it's like an mp3 player that doesn't play wav files, no matter how popular they are, even if they make smaller files with the same sound quality.

Toxaris
04-17-2012, 01:24 PM
Even if modern version of Winzip would support solid archives, I don't think it will work and why would you? To save a measly few bytes? By being able only to extract what is needed, they can use less powerful hardware and thereby saving battery power.

Jellby
04-17-2012, 03:43 PM
I also wonder why they didn't choose rar or 7z.

They needed something that is well standardized, well defined, public, with freely available implementations (both for compressing and decompressing), and which can be conveniently used in a handheld device. I don't know if rar or 7z qualify for those, but apparently zip did.

ProDigit
04-17-2012, 06:14 PM
Winrar is intended to be Windows-only. No idea why not 7z, except that it's a less universal system. Many programs can do zip; only 7zip can do 7z files. They went with the most common & accessible type of compression.

First, I think it's been said that the Jetbook Color has some form of mobile Winows installed on it.

Second, 7Z is only for windows. winrar is ported to all os'es including mac, and linux.

No, it's like an mp3 player that doesn't play wav files, no matter how popular they are, even if they make smaller files with the same sound quality.
Most readers can read uncompressed htmls, which in audio are like wav files.
epub is like zip, or mp3; while I would have chosen rar or 7z, which would translate to .wma or ogg as audio.
audiophiles would confirm wMA to be best for low bitrate, high compression, and ogg for high bitrate high compression.

ATDrake
04-17-2012, 06:30 PM
7z is available in a command-line port that works on Unix-like OSes, including the Mac. Several front-ends are available, mostly just for unarchiving.

RAR is a proprietary commercial format and last I bothered to check, the developer wanted $30 per install of the official RAR-creation software (discount bundle pricing available) and had a bit in the license of the unrar program saying that its source was not to be reverse-engineered to create a RAR-encoder.

Zip is common, understood, at a reasonably stable point in its development, and readily available or portable to just about any OS, and there are plenty of tools from multiple sources to deal with it, which is a decided advantage over the other formats, even once one factors out the commercial proprietary aspects of stuff.

Freeshadow
04-17-2012, 07:29 PM
Zip not only did, it's a nearly ISO standard.
Check RFC 1961
In fact if discussing compression methods it is better to refer to their own names instead of the names of file archive containers using said method, as we have seen, it might help avoid confusion.
So let's have a clean start from here on:
The metbod used in zip files (deflate) is used in several cases which can be roughly sorted in two groups:
The 1st uses not only the method, but the file structure too - in which case the file can be renamed to zip, altered and named back. This is correct for e.g. pkg files of some id shooter games and partly for epubs.
The 2nd uses the method to compress its data steams but doesn't necessarily follow zip file structure. PNGs are deflate compressed too.
7zip or: Here the mixing up began:
7z archives don't use the deflate method. They use lzma and are therefor out of discussion for epubs period.
The zips(!) produced by 7z use a more efficient DEFLATE implementation and are an option to consider.
This is why I pointed to: advancemame.sourceforge.net/comp-readme.html
It uses the same deflate implementation like 7zip. There you go: smaller yet epub spec-compliant zip files.
Doing a search on the topic I came across KZIP which is said to have an even higher deflate based compression than the above mentioned: advsys.net/ken/utils.htm

If it's still better than 7zips deflate, if it won't choke on the uncompressed mimetype file required by epub specifications I don't know but it sounds like worth trying.

As for your question why not to use solid archives like 7z: no long answer but some keywords: memory usage; decompression spee; dictionary size. Similar goes with your mp3 comparison: libvorbis or its smaller kin libtremor (being a lot less stable) need far more power than the small audio player offer. Haven't you never wondered why only the bigger ones do OGG?
Going back from your comparison with ebooks: How long did you say the file 'll need to open and how much RAM is needed for this!?

That's all dear Digit and I shall no longer distract you from reading the epub specifications which people present here and being far wiser on the subject than me suggested to you.

IMHO it's the best thing you can do, because randomly deleting stuff, while still hoping things don't get hopelessly f*d upis as far from an optimisation attempt, as amputation of private parts is from an weight loss therapy.

Elfwreck
04-17-2012, 07:30 PM
Most readers can read uncompressed htmls, which in audio are like wav files.

They can?

My Sony can't. Kindles can't. Nooks can't. I don't know about Kobo devices. My PocketEZ says it reads HTML, and they do open, but the formatting is messed up & it adds random hyphenations and breaks in the middles of words without hyphens.

Most dedicated ereaders can't display plain HTML in a way that's readable.

----
As was mentioned: Zip was chosen, for various reasons include widespread availability. Part of the reasoning was probably that the exact compression level isn't that important for over 90% of ebooks; they are *tiny* files. The ones that aren't tiny files, tend to be picture-heavy; compression levels don't matter much for those.

Current ebook tech doesn't have good support for *any* non-linear text, and the more complex the work, the less support there is. Ebook readers don't have great navigation options for hundreds of chapters--and the compression involved in .zip versus .7z is not a major part of that.

I do understand it's frustrating to see a standard that looks inefficient, but this one was established based on a lot of factors that no one publisher is going to care about.

ProDigit
04-17-2012, 08:10 PM
Haven't you never wondered why only the bigger ones do OGG?


There are virtually no differences anymore between ogg and mp3 decoding. Encoding is so optimized, that ogg can optimize compression by 2,5x (that is 250%), for 80% of the encoding speed compared to mp3.
Tests on battery life on mobile devices have shown that decoding ogg only takes marginally more power than mp3 (you lose less than 18% playing back an ogg file and about equal when plying back an ogg with similar quality).

From hardware side there's no noticeable increase needed to decode ogg.

Concerning ebook, a solid archive would make less sense when the reader extracts each html separate, but even if you select a large ebook, say 70MB, compressed to 7MB (say it has a lot of pictures in it), decoding needs less than 5MB of ram.
Most ebooks are less than 1MB (most of my epub books are about 750kB in size), they need less than 750kB of ram for decoding for solid archives, and I'm reading in 7z, that zip needs at least 2MB even for a 750kb file (don't know if that's true though).

Freeshadow
04-17-2012, 08:48 PM
From hardware side there's no noticeable increase needed to decode ogg. noticeable is a comfortably foggy definition which I simply won't accept as an argument, because what for a smartphone or a similar device might be an minor numbercrunching task may lay beyond the hardware limits of a small dedicated player.
On archives: well I use compression tools as long as I've used a computer. Floppies weren't cheap back then.
I know by daily experience that solid compression need a fair bit more of power and time to decode.
It's meant for archiving purposes not for active use. Would epub use it, there were enough small devices. choking on books only because of their decompression requirements. Space-saving backfired. This are all aspects to be taken into consideration when developing a standard. Just think about it.

ProDigit
04-17-2012, 09:07 PM
You'll need the processor of a needle pin size, to decode ogg, consuming mere micro watts of power, operating at a speed of a few megahertz. You think that's much?
OGG gets decoded at over 300x at speeds of only 1,66Ghz dual core cpu.
currently ANY device can decode ogg, if there was an adventurous soul out there willing to write a decoder for a gameboy, it would be able to do that.

I know by daily experience that solid compression need a fair bit more of power and time to decode.
It's meant for archiving purposes not for active use. Would epub use it, there were enough small devices. choking on books only because of their decompression requirements.

You mean 'encoding'?
Decoding requires virtually nothing, ~1MB of ram; but I understand solid archiving is not chosen because of the benefit that the opposite brings

JSWolf
04-17-2012, 11:10 PM
The advantage of OGG is that at a similar size to a given MP3, the sound quality will be better. At a similar sound quality to a given MP3, the OGG file will be smaller. OGG requires almost no processor to decompress. So if you have a DAP that handles OGG, it's better to encode in OGG then MP3.