02-05-2013, 10:24 PM | #1 |
Blacklisted
Posts: 13
Karma: 10
Join Date: Feb 2013
Device: palm pixi
|
Most efficient ereader format & how to remove images
seems to be
tcr. In real life, this is because I used calibre, which cannot remove images and probably doesn't do tcr well. If I wanted to, I could have 1. converted all books to zip 2. run this from dos: c: cd \library for /r %x in (*.zip) do "C:\Program Files\7-Zip\7z.exe" d -r "%x" *.jpg *.png *.jpeg *.gif 3. converted all books from zip to whatever before doing the tests. In that case, epub is about tied with first, but pdf really sucks. Before you ask, why did I look at gzed books? Because since stacker you could compress files using software. [Personal link deleted - Please see our rules regarding signatures for new membes - MODERATOR] P.S. My reason for posting is in case there's a less insane way of removing images that someone seeing this knows about easier than the above method. If so, please inform me; I'm rather ignorant about ebooks. Last edited by Dr. Drib; 02-06-2013 at 03:48 PM. |
02-06-2013, 10:55 PM | #2 |
Grand Sorcerer
Posts: 12,167
Karma: 73448616
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
|
You are aware that ePubs (and most, if not all other eBook formats) are actually compressed already? So there is a strong probability that compressing them again will actually cause the files to grow in size?
Likewise graphical images are also mostly compressed internally... |
02-07-2013, 01:11 AM | #3 |
Evangelist
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
|
Most efficient format, as in the one that takes up the least amount of space? Listen, don't worry about space. 500 GB/1 TB drives are not that expensive and e-books don't take up that much space anyway. Maybe if you have several thousands of them... but even then it's still manageable (probably under 10 GB).
And why would you want to remove images? That just isn't right... For ePubs, I guess you could delete the content of the Images folder, or maybe the folder itself. But will it still be a valid ePub then? I don't know. The images were put there for a reason. Removing them is like making soup without any oil or salt. Have you ever tried it? It tastes awful. Last edited by DSpider; 02-07-2013 at 01:14 AM. |
02-07-2013, 06:16 AM | #4 |
350 Hoarder
Posts: 3,574
Karma: 8281267
Join Date: Dec 2010
Location: Midwest USA
Device: Sony PRS-350, Kobo Glo & Glo HD, PW2
|
I've run across a few books that were excessively bloated to 3-5x the normal epub file size because of uncompressed images. They included images for every book by an author (which I really don't care to see when I'm reading), and multiple cover images. There's sometimes a background image for certain pages... I don't need the faded publishing house name on the publishing page for instance, but if it's a small enough file size, I'll leave it. Once I delete all those types of crap images, I compress the images that are left which usually reduces them to up to 12% of their original size (without quality loss, it can be done very easily), and I'm back to an epub that's under 500KB as it should be.
I never delete images essential to the text, dividers between sections, publishing logos, etc that are really part of the book. But I see no reason to use up space on my reader with images of 30 other books. I just open those types of books in Sigil and delete the images from there, drag the ones that stay into my graphic program, compress and save, and delete the pages that have all the images on them. |
02-09-2013, 10:02 PM | #5 | |
Blacklisted
Posts: 13
Karma: 10
Join Date: Feb 2013
Device: palm pixi
|
First, to whomever undeleted this thread, thank you!
Quote:
Yes, that some formats are already compressed was the whole point. I was just curious how efficient they are at storing text. To know the answer I *had* to remove the images, because otherwise I was also measuring a separate variable: image compression, which can be adjusted all over the place and wasn't what I wanted to measure. BTW the command I gave was wrong. PM me if you want to do this test yourself. (One needs */*.jpg etc also.) I completely agree it is morally wrong to remove images. In my case I wanted to measure something. I should probably be burned at a stake for doing that. But, OTOH, I once read about an sf author who was very upset about the picture the publisher had drawn for his book. It was against the meme of the book. You know how they try to sex up stuff these days. The author lost out. Also, my reader creates a library some mystery place on the hdd, and was going super slow. And HDD was out of disk space. My hdd is always low on space. I blamed a recipe book with like 100 hd images. Can clog up your computer & your reader. |
|
02-09-2013, 10:31 PM | #6 |
350 Hoarder
Posts: 3,574
Karma: 8281267
Join Date: Dec 2010
Location: Midwest USA
Device: Sony PRS-350, Kobo Glo & Glo HD, PW2
|
Morally wrong to remove images? I must be a very very bad person.
Anyway, my reader doesn't have an SD slot (don't really need it or want it, I only load about 100 books at a time anyway since I prefer to keep all my books in one location in my Calibre library on my PC). But even so that doesn't mean I want epubs over 2000KB when they're usually around 300-400kb. I never delete images that are part of the storyline or any illustrations, even fancy section separators. I will compress them though, there's no reason they need to be 300kb in size when it will look the same if compressed properly down to 60kb. Yes, you can do that even for color images for tablets and see no difference. What I will always continue to remove is advertising for every other book the author ever wrote, sometimes several authors if from the same publishing house, and then insist on including a cover image for each. Then there's the cover page on page 1... few pages later there's another cover image, same image, just a slightly different size so they can't even use the same one, more senseless bulk. Then further on at the back where there's often "About the Author" blurbs, etc. they include yet another cover image... yes, same image but again a different size. That's just senseless bloat to the book and I'll continue to prune those out. If they have 5 different sizes for the same image for section separators, I'll eliminate 4 of them and use one in all locations. Easy to do with Sigil. Btw, this thread was never deleted. If I remember I think your originally posted it (incorrectly) in General Discussion and a mod nicely moved over to the Workshop where it belonged. Perhaps you just didn't find it after the move. But I don't think I've ever seen a thread deleted here. |
02-10-2013, 01:04 AM | #7 |
Blacklisted
Posts: 13
Karma: 10
Join Date: Feb 2013
Device: palm pixi
|
Hehe. It's almost gotten to the point where if you rip a page out of a book, you get a knock on the door from the publisher. And 1000 years ago books were somewhat illegal, so it could be worse.
I respectfully disagree. My stats page showed I had zero posts instead of one, like I had. Google had indexed the original page. So my post was in a deleted mode for some time. So, IMO, either you are wrong or the software running this board is not working correctly, and the # of posts you have showing is inaccurate. I'm sure. If you search another post of mine you might get more info, but this discussion is off limits according to rule 12, so we will have to leave it at that I guess. Umm, you know, if it's been deleted, uhh, you might not know it existed. http://en.wikipedia.org/wiki/Self-selection_bias Last edited by jasontaylor7; 02-10-2013 at 01:51 AM. |
02-10-2013, 02:47 AM | #8 | ||
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
I am reasonably sure that if your post had been actually deleted you would have received a private message about why the deletion had occurred.
Quote:
Looking at your first post a board level moderator removed a link Quote:
This "auto moderation" state is only for folks with very limited posts (less then 10 posts is my SWAG) with a html link in the post. I see (but can't act on) posts in a moderation state as a forum level moderator in the calibre forum, but most folks would never see these posts until cleared by a board level moderator or deleted as spam. I hope this clears up your mystery. Last edited by DoctorOhh; 02-10-2013 at 03:34 AM. |
||
02-10-2013, 10:08 AM | #9 |
Blacklisted
Posts: 13
Karma: 10
Join Date: Feb 2013
Device: palm pixi
|
The fact that the post was cached by google implied my post was deleted to all but the mods. I've posted elsewhere. The software used here isn't that unique. The anti-spam feature in board software you describe automatically prevents any initial posts with hyperlinks from ever being displayed until they are reviewed. Mine was up. So I disagree your theory that software automatically placed my post into a holding pattern it after it was up for some time. In fact, I find it very disturbing that you would posit that a mod didn't put it there on purpose. Also, while in the "limbo" state you describe, from my perspective, the post was deleted, since no pm was made to me. That the person who deleted it *might* have intended for it to have been reposted later would be better, but I see no testimony here from any such mod, and it doesn't change the perception to all except the mods, which is most of the community here. Also, the notion that it was merely moved but not deleted and later restored violates the way the word "moved" is commonly used in the computer industry (in which a copy is first made in step 1, and then in step 2 the original is deleted, causing 2 versions to temporarily exist, not zero, as was the case for me.) Lastly, your apparent impressive boundless desire and persistent need to make it seem like this board is like god's gift to mankind or that my post was never effectively temporarily deleted is as at least suspicious as your need to, e.g., to largely deny the existence deleted posts (something this board's rules clearly state do exist), or to deny the various issues and imperfections of the free program, calibre. I mean, I didn't invent the verb, "Calibre-ized." The software is good, like this board, but, as you yourself admitted, it has issues, and like other things, including the efficiency of the pdf format, is not perfect.
Last edited by jasontaylor7; 02-10-2013 at 10:34 AM. |
02-10-2013, 01:26 PM | #10 |
A Hairy Wizard
Posts: 3,095
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Jason...I usually hate jumping in the middle of an argument...but it seems you are painting people with a pretty broad brush. DoctorOhh only replied once and posited a rational explanation for what "might" have happened. You seem to jump all over him and then start attacking others that are only trying to help.
Perhaps you may wish to "tone it down" a little?? Cheers! |
02-10-2013, 11:19 PM | #11 | |||
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
Quote:
Since the zero posts from being in moderation is still a viable reason you didn't see your post I would guess that Ripplinger is correct. Quote:
Quote:
Last edited by DoctorOhh; 02-10-2013 at 11:22 PM. |
|||
02-12-2013, 08:20 PM | #12 | |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Quote:
You are not taking into account that many "compression" formats have header or other framing information they add to the file. Zip (while not technically a compression algorithm) and TCR are such formats. Every time you compress it creates a zip header and list of file entries. Zip a zip file 100 times and at a certain point you will start seeing the file size increase. FYI. I wrote the TCR compression implementation used by calibre. Another issue I see with your test is the formats your using. Lets take HTMLZ, PMLZ, TXT and TCR. HTMLZ and PMLZ both contain formatting information while TXT and TCR are text only (no formatting). So in your test you're not taking into account formatting. So your test is really, "Most efficient ereader format for storing only text without formatting." I would argue that formatting is part of the book and losing formatting (I would't argue images) is detrimental. For example, removing new lines so you have a stream of characters on a single line will produce a smaller file than with your test. However, is a single line of text acceptable? Some formats do lend themselves to compression more so than others. A binary file like RB and MOBI is going to be harder to compress compared to a TXT file. a TXT file (especially a written work like a book) is going to have a lot of repetition. That said, I'm not saying figuring out which compression is best for ebooks isn't bad. I'm just saying your testing methodology needs some work. |
|
02-12-2013, 10:19 PM | #13 |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
I want to clarify a few points I made.
When I'm referring to the loss of formatting I'm looking at the starting file size. A file with more information will typically be larger than a file with less information. Think of comparing a blu-ray to a vhs tape. The content may be the same but there is a huge difference in quality and amount of information. Less data will compress better than more data. So the comparison between the given formats is not a good apples to apples comparison in this regard. This test really should look at the overall compression ratio. That is the percentage shrunk form the original size. Any other comparison isn't really valid. My binary format comment has a few facets. you also need to keep in mind that some formats are already compressed. This can lead to reduced compression when compressed again vs if the data itself was uncompressed. Compression (typically) looks for repeated patterns. Compressing once will remove many patterns making subsequent compressions less performant until it cannot find any patterns and will not be able to reduce the size any more. Which leads to the issue of binary formats and testings only with the gzip (gz) format. This only one compression format. It works great for text and is an all around good compression format. However, there are other compression formats that work better than gzip for binary data. There are other compression formats that work better in general but that's beside the point. You're only looking at one compression format and while one ebook format due to the nature of that ebook format compresses very well with gzip you can hardly say that ebook format has the best compression. Another compression format that works better with binary data could compress some of the other formats better than gzip can. I don't mean by producing a smaller files but by producing a better compression ratio for the given files. Finally there is a difference between a compression format and a compression algorithm. gzip is a compression format not a compression algorithm. gzip uses the deflate compression algorithm. Which just so happens to be one of the (the main and the one required by the epub standard) algorithms used by the zip format. Which leads to the fact that a gzip and zip compressed file even using the same algorithm will end up with different sizes because they have different header/structural components. To truely compare you need to take this into account. But this becomes complicated when you look at formats like TCR that are both a compression format, ebook format, and algorithm. So really all that's been shown in this test is the smallest file format which is known to be a format that compresses will with deflate ends up giving the smallest compressed file size. Larger files, with more data, in a format that does not have as good compression with deflate yield a larger file size. |
02-12-2013, 11:02 PM | #14 |
Resident Curmudgeon
Posts: 73,983
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
From what I have seen, ePub is generally smaller (unless it has embedded fonts or better quality images) then Mobi. ePub converted to KF8 (AZW3) is always smaller. So overall, ePub is the format of choice if you want a smaller eBook.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
remove all images ? | cybmole | Calibre | 20 | 03-12-2024 07:02 AM |
Change default eReader, format & justification | ccayer | Calibre | 1 | 01-04-2011 11:07 PM |
The Quest for that Elusive Hi-Res & Large Format eReader | paaThaka | Which one should I buy? | 12 | 10-22-2010 05:02 PM |
ereader PDB format and cover page images | Rootman | Calibre | 8 | 11-26-2009 12:21 PM |
Classic Is B&N going to kill eReader format? | smithno | Barnes & Noble NOOK | 3 | 10-22-2009 09:04 PM |