Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 09-07-2012, 07:43 PM   #1
Cephas Atheos
Member
Cephas Atheos is on a distinguished road
 
Cephas Atheos's Avatar
 
Posts: 11
Karma: 50
Join Date: Sep 2012
Location: In the hills around Melbourne, Australia
Device: Kindle DX
Question Silly question about preferred formats

G'day everyone,

I have a fairly large library of DRM-free personal ebook files, but I'm trying to figure out the "best" format to import into calibre when more than one source format is available.

Typically, I have duplicates in .htmlz or SVG/Zip formats. The ZIP format is always a much larger file, but when comparing the files within calibre, there doesn't seem to be any major visible differences, at least in terms of readability, images, and so on.

Eventually, all of these will be converted to PDF and plain text for archival purposes.

My inclination as a restorator/archiver is to use the source format with the larger data size (in other words, the zip format), but is there any point to me doing this? In other words, is the zip format just a less efficient storage alternative to the htmlz format, and (all else being equal) they contain essentially the same (identical) core data?

Thanks for wading through this... Any and all comments and suggestions are most welcome.

-Pete
P.S. Yes, I'm too lazy to perform a byte-by-byte comparison of the contents of each format, although I think I have the tools to sort of do that...
Cephas Atheos is offline   Reply With Quote
Old 09-08-2012, 03:24 PM   #2
Adoby
Handy Elephant
Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.
 
Adoby's Avatar
 
Posts: 1,736
Karma: 26785668
Join Date: Dec 2009
Location: Southern Sweden, far out in the quiet woods
Device: Thinkpad E595, Ubuntu Mate, Huawei Mediapad 5, Bouye Likebook Plus
My preference is epub, then mobi. Or PDF. I never convert to/from PDF, but frequently from mobi to epub if I don't have easy access to the book as epub. Never from epub to mobi, since I mainly read book as epub or PDF.

In rare cases I also use other formats, but then usually edit and manually convert to epub using a combinaton of Calibre and Sigil, if needed.
Adoby is offline   Reply With Quote
Advert
Old 09-08-2012, 04:51 PM   #3
Agama
Guru
Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.
 
Agama's Avatar
 
Posts: 776
Karma: 2751519
Join Date: Jul 2010
Location: UK
Device: PW2, Nexus7
If your source formats give equally good results then I wouldn't have thought it matters which one you select.

However, I am interested in why you have selected PDF and plain text for archival purposes. These would be some of the last formats that I would choose for an archive: PDF because of the lack of reflow; plain-text because of the lack of formatting.

ePub would give you reflow and also supports a lot of formatting. Maybe you have other requirements that require PDF and plain text?
Agama is offline   Reply With Quote
Old 09-13-2012, 06:25 AM   #4
fratermus
e-bookworm
fratermus ought to be getting tired of karma fortunes by now.fratermus ought to be getting tired of karma fortunes by now.fratermus ought to be getting tired of karma fortunes by now.fratermus ought to be getting tired of karma fortunes by now.fratermus ought to be getting tired of karma fortunes by now.fratermus ought to be getting tired of karma fortunes by now.fratermus ought to be getting tired of karma fortunes by now.fratermus ought to be getting tired of karma fortunes by now.fratermus ought to be getting tired of karma fortunes by now.fratermus ought to be getting tired of karma fortunes by now.fratermus ought to be getting tired of karma fortunes by now.
 
fratermus's Avatar
 
Posts: 86
Karma: 630090
Join Date: Sep 2012
Device: PW2, K3, KF2, Touch (dying)
Quote:
Originally Posted by Agama View Post
However, I am interested in why you have selected PDF and plain text for archival purposes.
Same question jumped out at me.

I'd love to hear the OP's thoughts on the criteria used in choosing archival formats. Not intending to judge -- I'm genuinely interested in the kinds of challenges archivers and collectors face.

Last edited by fratermus; 09-13-2012 at 06:26 AM. Reason: speeling :-)
fratermus is offline   Reply With Quote
Old 09-13-2012, 07:23 AM   #5
Cephas Atheos
Member
Cephas Atheos is on a distinguished road
 
Cephas Atheos's Avatar
 
Posts: 11
Karma: 50
Join Date: Sep 2012
Location: In the hills around Melbourne, Australia
Device: Kindle DX
[Note : My apologies to anyone actually interested in this thread, and particularly to Agama and fratermus : I've had a technically challenging week (I have a number of embedded medical devices to help manage a spinal problem, but one of those failed this week) and so I'm quite literally flat on my back at the moment... and it wasn't just an internal "technical difficulty" : the wifi router chose this week to munch it's own brain, so I've been wifiless until a couple hours ago. I don't know which device failure was worse... So I wasn't being intentionally rude, I swear! Please accept my apologies for the delay, and I hope the drivel below is enough to go on with... - Pete]

Primarily, I use plain text as a fallback, worst-case-scenario archive format. It's been around since microprocessors started using 8-bit data paths, and all my text editing and retrieval tools are Unicode aware, so multiple source languages aren't an issue from that perspective. I've had some "pointed" questions asked of me by various folks with an axe to grind about ASCII, but to be honest, I'm quite happy to use ASCII if the source material's in English.

The reason I also use PDF as an archive option has more to do with the original intent (and the longevity) of the portable document format. I have some material that was encoded with Acrobat 1.0, and that's still perfectly readable in the latest readers on the iPhone, iPad, Macbook pro, and all my other notebooks (HP Omnibooks, HP palmtop, Jornada, and Kindle DX), as well as my main system running Win7 x64.

So it's been - up until now, anyway - a bit of a no-brainer as far as having a "universal" format that preserves both the full document and all the metadata I can poke a stick at!

Since storage isn't an issue, I have no need to compress any material, nor do I have to worry about changing compression algorithms or incompatibilities in that area. While that may sound artificial and contrived, a few weeks ago I had to try and unpack a CP/M .lbr archive containing some technical manuals in WordStar 3.3 format, and I ended up writing my own decompressor since no modern archiving utility on any platform still handles that format! But back in the day, there simply was nothing better than WordStar, and no better archive/library manager than LBR! But things change in the long run, I guess!

So I hope that helps clarify why I selected and use what are (based on recent, unrelated, comments in a couple of documentation forums I try to participate in) fairly universally ignored (or at least, misunderstood) document formats for my long-term archiving.

However, as a rational and sceptical person, I'm always open to suggestions for better document archive formats (by "better", I suppose I mean more powerful/flexible, more widely supported, with stronger pedigrees and broader future possibilities on more platforms). So if anyone has any suggestions, by all means, I'd like to hear your ideas!

Hopefully this is helpful!

[EDIT]
Oh, I forgot to answer your point about reflow and rendering speed...

As far as reflow goes, my primary interest as an extremely fast reader, is quick display of maximum text, as well as fast access and navigation, and I haven't reached the limit of PDF readers yet. The Kindle DX is the touchstone there for me. It's faster when displaying commercial ebooks (MOBI, AWZ, EPUB), but like most ebook readers, it's terribly limited in terms of margin and font options in those modes. So while the HTML format is faster in terms of reflow and rendering, it's a real waste of time for me!

That's another reason I prefer PDF formats for reading on the Kindle - the resolution is good enough that I can publish to PDF using a small enough font size and small enough margins that I don't have to turn the page every 5-10 seconds. Since I normally read at around 800-1,000 wpm, that translates to roughly 30-45 seconds per physical paperback page, depending on font size and layout. Of course, I'm slowing down as I age... I used to read flat out at around 1,800 wpm, but my eyes ain't what they used to be, so I've slowed down since getting reading glasses. But I still read many times faster than most people, so volume (maximum text density) is still critically important to me, more so than raw display update speed.

For me, a typical MOBI or EPUB commercial text, at maximum 'native' resolution, has me pressing the 'next' button about every 18-20 seconds on the DX (minimum font size, minimum margins, maximum text per line). But that flattens the battery in about 4 days! (Boy was I unhappy when I found that out. Amazon technical support informed me that they tested the battery life with a 'typical' page turn time of between 50 seconds and 2 minutes per page at the default text settings. But I was turning the page between about every 5 and 11 seconds at the default settings. So much for "battery life up to a month"! I later tried both a standard Kindle someone lent me, and a Nook I tried in a demo, and I was flipping pages 6-12 seconds apart, which was a total PITA...)

So the DX - which is what I read 95% of my digital text on - has a PDF rendering speed, at the lowest font size and narrowest margins, much faster than I can keep up with. So rendering and reflow aren't the deciding factor for me because the limitations are the size and resolution limitations on the devices (that I can afford, anyway) that have technically faster text reflow and page rendering speeds. In fact, I can read far more, far more quickly, on the DX than I can on anything else in production, including my laptops and my main system, and any speed improvement inherent in the other devices is irrelevant because of the resolution limitations. Plus, the limits most devices -including the Kindle- put on higher text density and font sizes means that I don't care how fast they render and reflow the text, because I'm completely distracted pressing the 'next page' button instead of reading.

But that's just me, and I understand that real people (with lives and friends and stuff) enjoy and appreciate the other devices and the clever and complex (and fast!) rendering and displaying they do. It's just not for me.

Hopefully that addresses the non-technical reasons for the choice of my archive formats...

Sorry for the long boring story.

Last edited by Cephas Atheos; 09-13-2012 at 09:41 AM. Reason: Apology for delay
Cephas Atheos is offline   Reply With Quote
Advert
Old 09-14-2012, 02:26 AM   #6
fratermus
e-bookworm
fratermus ought to be getting tired of karma fortunes by now.fratermus ought to be getting tired of karma fortunes by now.fratermus ought to be getting tired of karma fortunes by now.fratermus ought to be getting tired of karma fortunes by now.fratermus ought to be getting tired of karma fortunes by now.fratermus ought to be getting tired of karma fortunes by now.fratermus ought to be getting tired of karma fortunes by now.fratermus ought to be getting tired of karma fortunes by now.fratermus ought to be getting tired of karma fortunes by now.fratermus ought to be getting tired of karma fortunes by now.fratermus ought to be getting tired of karma fortunes by now.
 
fratermus's Avatar
 
Posts: 86
Karma: 630090
Join Date: Sep 2012
Device: PW2, K3, KF2, Touch (dying)
Thanks for the thorough answer. The most surprising aspect (in a good way) is the way formats affect the user experience in terms of speed. I am in no danger of outrunning my old KK. :-)
fratermus is offline   Reply With Quote
Old 09-14-2012, 03:20 AM   #7
Agama
Guru
Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.
 
Agama's Avatar
 
Posts: 776
Karma: 2751519
Join Date: Jul 2010
Location: UK
Device: PW2, Nexus7
Yes, thanks for the full answer. I also keep plain text as a fallback option but I encode in utf8, just to make some of the extended characters a bit easier to input and visually better in Notepad++, (e.g. non-breaking space, m-dash, n-dash, smart-quotes.)

I wasn't aware of the Kindle DX, (looks a nice size screen), and that it supports ePub - this seems very non-Amazon.

Last edited by Agama; 09-14-2012 at 03:29 AM.
Agama is offline   Reply With Quote
Old 09-14-2012, 03:26 AM   #8
TimW
Wizard
TimW ought to be getting tired of karma fortunes by now.TimW ought to be getting tired of karma fortunes by now.TimW ought to be getting tired of karma fortunes by now.TimW ought to be getting tired of karma fortunes by now.TimW ought to be getting tired of karma fortunes by now.TimW ought to be getting tired of karma fortunes by now.TimW ought to be getting tired of karma fortunes by now.TimW ought to be getting tired of karma fortunes by now.TimW ought to be getting tired of karma fortunes by now.TimW ought to be getting tired of karma fortunes by now.TimW ought to be getting tired of karma fortunes by now.
 
TimW's Avatar
 
Posts: 1,022
Karma: 6824104
Join Date: May 2011
Location: Southeastern Kentucky
Device: KK3G, KPW1, Sony PRST1, Sony PRS350, iPod Touch 5G
Quote:
Originally Posted by Agama View Post
Yes, thanks for the full answer. I also keep plain text as a fallback option but I encode in utf8, just to make some of the extended characters a bit easier to input and visually better in Notepad++, (e.g. non-breaking space, m-dash, n-dash, smart-quotes.)

I wasn't aware of the Kindle DX, (look a nice size screen), and that it supports ePub - this seems very non-Amazon.
The DX doesn't do ePub unless it has been hacked.
TimW is offline   Reply With Quote
Old 09-14-2012, 06:42 AM   #9
Cephas Atheos
Member
Cephas Atheos is on a distinguished road
 
Cephas Atheos's Avatar
 
Posts: 11
Karma: 50
Join Date: Sep 2012
Location: In the hills around Melbourne, Australia
Device: Kindle DX
Quote:
Originally Posted by TimW View Post
The DX doesn't do ePub unless it has been hacked.
You're spot on, Tim. So I've learned to use calibre to convert ePub straight to PDF for the Kindle. So far in my (admittedly limited) travels, ePub and MOBI seem to be battling it out for top spot (again, at least in terms of my fancies). So unless there's a specific reason to convert to MOBI, I don't mind pretty much any format, as long as calibre can deal with it. I also use Paper2 for technical articles, which is another reason to stay consistent with the PDF format, though for different reasons.

Getting back to the OT, it seems that with modern tools like calibre and Sigil and so on, it's pretty much down to a preference for conversion speed or conversion features that defines which source format I "prefer". Barring any other comments to the contrary, it does seem to be six of one and half a dozen of the other, when it comes to sources. My only rule is being able to textify and PDFify to my (very loose) specs, and I honestly can't tell the difference between a PDF produced from HTMLZ and one produced from the same content, but in SVG format.

There are obviously important differences in terms of the features and functions of HTMLZ and SVG, but so far they're irrelevant to me - which is actually a pretty nice place to be. I'm sure that will change though!

Thanks to everyone who's responded. It's been really instructive.
Cephas Atheos is offline   Reply With Quote
Old 10-16-2012, 05:14 PM   #10
Cephas Atheos
Member
Cephas Atheos is on a distinguished road
 
Cephas Atheos's Avatar
 
Posts: 11
Karma: 50
Join Date: Sep 2012
Location: In the hills around Melbourne, Australia
Device: Kindle DX
Now that I understand the PDF better, and done some trial document evaluations with standardised text, I can see Agama's point about flexibility and reflow.

For anyone still reading this thread, I'll just say that PDF documents are the textual equivalent of taking a photograph of your document displayed on your monitor. You can still see all the relationships between objects displayed, but you can no longer move anything around, not even from one line to another.

Since I tend to re-edit and generate my own documents, I've never had an issue with reflow - it's always been there, even before WYSIWYG editing. And since most of my devices have a more-or-less equivalent display area and resolution, PDF was fine since it bypassed the limitations of minimum text sizes and so on.

But as an archive solution, I guess I'll keep looking. ePUB looks pretty good, in terms of definition and support (and I'm a huge fan of the Dublin Core metadata standard because of my work with audio recordings), but HTML rendering on the DX is abysmal, so for me it's a matter of an additional step to generate MOBI formatted copies of all my docs just for the Kindle.

At least now that I have better tools, I have more control over the amount and quality of text displayed, which makes my life a little easier.

Either way, I'm still using plain text as my touchstone "gold standard", since it retains the content, which is the most important aspect of archiving for me, since images and other gewgaws are irrelevant in the vast majority of book data I'm storing. Context I can figure out later, but as I said, EPUB is looking pretty good, with MOBI for the Kindle.

Thanks again to everyone who made suggestions and comments, and I hope this might be useful for anyone else reading this.
Cephas Atheos is offline   Reply With Quote
Reply

Tags
archiving, file duplication, format questions


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Preferred Input Formats Ken Stuart Calibre 4 01-02-2011 01:34 PM
May be a silly question...but... mdibella Sony Reader 5 12-11-2009 11:01 AM
Maybe a silly question lukasfikr Calibre 1 08-27-2009 01:41 AM
Jetbook preferred formats Lanx Ectaco jetBook 9 06-16-2009 10:47 AM
What they really wanted to say....(utterly silly silly silly) GeoffC Lounge 27 05-25-2008 02:17 PM


All times are GMT -4. The time now is 02:39 PM.


MobileRead.com is a privately owned, operated and funded community.