View Single Post
Old 02-06-2010, 12:43 AM   #2
Solitaire1
Samurai Lizard
Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.
 
Solitaire1's Avatar
 
Posts: 14,966
Karma: 70029956
Join Date: Nov 2009
Device: NookColor, Nook Glowlight 4
Quote:
Originally Posted by CallOfCth'reader View Post
I'm at the point where I need to get rid of a number of books, due to space considerations. They are almost all second-hand books, in a condition where they wouldn't grace anyone's bookshelf, and almost every one is not available in e-book format to buy.

I don't mind destroying the books, so that I can get flat scans of the pages (and at least it makes the paper easier to recycle), but what is the best format to archive them to?

My immediate thought was archive to PDF, and then convert to EPUB as and when I have the time and inclination to do it. How good is OCRing from PDFs? Or should I just archive the scans themselves, and OCR as and when?
I can't speak to everything, but you may want to save your archive in several formats, rather than relying on just one. For example, you could save the OCR results as plain text, RTF, and OpenDocument Text. Each will preserve the text, and RTF and OpenDocument Text will preserve some of the formatting. With each of these formats there should be always be tools readily available that will allow you to open them. For example with plain text, as far as I know all computers, come preloaded with an application to open them. Also, you could save the scans as a graphic file (like a .jpg) to preserve the look of the original page.

As far as PDF goes, I view PDF as a good final destination format since it preserves all of the formatting. However, I don't keep PDFs as my archive copy. Rather, I save my archive copy in other formats (mostly OpenDocument Text) and generate the PDF from that. The main reason for this is that PDFs tend to be good at one size, and not so good at other sizes. But I can take the archive source, adjust the formatting, and make a PDF appropriate for whatever use I need (such as on my computer screen, on my reader, or on paper).

I hope this helps.
Solitaire1 is offline   Reply With Quote