Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 02-05-2010, 09:57 PM   #1
CallOfCth'reader
Cultist
CallOfCth'reader ought to be getting tired of karma fortunes by now.CallOfCth'reader ought to be getting tired of karma fortunes by now.CallOfCth'reader ought to be getting tired of karma fortunes by now.CallOfCth'reader ought to be getting tired of karma fortunes by now.CallOfCth'reader ought to be getting tired of karma fortunes by now.CallOfCth'reader ought to be getting tired of karma fortunes by now.CallOfCth'reader ought to be getting tired of karma fortunes by now.CallOfCth'reader ought to be getting tired of karma fortunes by now.CallOfCth'reader ought to be getting tired of karma fortunes by now.CallOfCth'reader ought to be getting tired of karma fortunes by now.CallOfCth'reader ought to be getting tired of karma fortunes by now.
 
CallOfCth'reader's Avatar
 
Posts: 195
Karma: 8624438
Join Date: Jun 2009
Location: UK
Device: Sony PRS 505, Kobo Mini, Kobo Glo, Kobo Forma, Kindle DX
Best way to archive a book - PDF?

I'm at the point where I need to get rid of a number of books, due to space considerations. They are almost all second-hand books, in a condition where they wouldn't grace anyone's bookshelf, and almost every one is not available in e-book format to buy.

I don't mind destroying the books, so that I can get flat scans of the pages (and at least it makes the paper easier to recycle), but what is the best format to archive them to?

My immediate thought was archive to PDF, and then convert to EPUB as and when I have the time and inclination to do it. How good is OCRing from PDFs? Or should I just archive the scans themselves, and OCR as and when?
CallOfCth'reader is offline   Reply With Quote
Old 02-06-2010, 12:43 AM   #2
Solitaire1
Samurai Lizard
Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.
 
Solitaire1's Avatar
 
Posts: 14,254
Karma: 66666666
Join Date: Nov 2009
Device: NookColor
Quote:
Originally Posted by CallOfCth'reader View Post
I'm at the point where I need to get rid of a number of books, due to space considerations. They are almost all second-hand books, in a condition where they wouldn't grace anyone's bookshelf, and almost every one is not available in e-book format to buy.

I don't mind destroying the books, so that I can get flat scans of the pages (and at least it makes the paper easier to recycle), but what is the best format to archive them to?

My immediate thought was archive to PDF, and then convert to EPUB as and when I have the time and inclination to do it. How good is OCRing from PDFs? Or should I just archive the scans themselves, and OCR as and when?
I can't speak to everything, but you may want to save your archive in several formats, rather than relying on just one. For example, you could save the OCR results as plain text, RTF, and OpenDocument Text. Each will preserve the text, and RTF and OpenDocument Text will preserve some of the formatting. With each of these formats there should be always be tools readily available that will allow you to open them. For example with plain text, as far as I know all computers, come preloaded with an application to open them. Also, you could save the scans as a graphic file (like a .jpg) to preserve the look of the original page.

As far as PDF goes, I view PDF as a good final destination format since it preserves all of the formatting. However, I don't keep PDFs as my archive copy. Rather, I save my archive copy in other formats (mostly OpenDocument Text) and generate the PDF from that. The main reason for this is that PDFs tend to be good at one size, and not so good at other sizes. But I can take the archive source, adjust the formatting, and make a PDF appropriate for whatever use I need (such as on my computer screen, on my reader, or on paper).

I hope this helps.
Solitaire1 is offline   Reply With Quote
Advert
Old 02-06-2010, 01:52 AM   #3
rmm200
Groupie
rmm200 has a complete set of Star Wars action figures.rmm200 has a complete set of Star Wars action figures.rmm200 has a complete set of Star Wars action figures.rmm200 has a complete set of Star Wars action figures.rmm200 has a complete set of Star Wars action figures.
 
Posts: 195
Karma: 414
Join Date: Jan 2010
Location: Bend, OR
Device: Sony PRS-600
I really suggest RTF or EPub for archival purposes. PDF will never convert well to EPub. As Solitaire said, PDFs should be viewed as a final destination; not something you generate other formats from.

Robert
rmm200 is offline   Reply With Quote
Old 02-06-2010, 04:10 AM   #4
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
In addition to the text versions, I would keep a PDF made from the raw (or minimally pre-processed) scanned images, at least until you've read the book and fix all possible issues. Too often the OCR gets screwed and you really need to check the printed book (or the scans) to see what's really there.
Jellby is offline   Reply With Quote
Old 02-06-2010, 09:59 AM   #5
chainring
Addict
chainring ought to be getting tired of karma fortunes by now.chainring ought to be getting tired of karma fortunes by now.chainring ought to be getting tired of karma fortunes by now.chainring ought to be getting tired of karma fortunes by now.chainring ought to be getting tired of karma fortunes by now.chainring ought to be getting tired of karma fortunes by now.chainring ought to be getting tired of karma fortunes by now.chainring ought to be getting tired of karma fortunes by now.chainring ought to be getting tired of karma fortunes by now.chainring ought to be getting tired of karma fortunes by now.chainring ought to be getting tired of karma fortunes by now.
 
chainring's Avatar
 
Posts: 210
Karma: 1000659
Join Date: Jan 2009
Location: Sunnyvale, CA
Device: Kindle Voyage, Kobo Aura H2O, PRS-650 (black), Kindle 3G
In addition to what Jellby has already said, that raw pdf (no ocr performed) can then later on be run through a program (Abbyy FineReader, for example) that takes pdf input and does its recognition to do the ocr. You have the archive, can trash (err, recycle) the book and move onto another one then come back for the ocr process at a later point.
chainring is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Remove from Archive (book already "deleted" in Amazon account) kindletommy Amazon Kindle 9 08-09-2012 06:17 PM
PRS-500 Manga2Ebook, initial release. Convert your manga/comics-archive to PDF (.net2.0) athlonkmf Sony Reader Dev Corner 48 02-22-2011 09:44 AM
E-book Interview the Second: The Tainted Archive Steven Lyle Jordan News 1 10-09-2009 10:34 AM
Internet Archive wants book copyright indemnity like Google anurag News 0 04-19-2009 11:40 PM
iLiad Manga2Ebook, initial release. Convert your manga/comics-archive to PDF (.net2.0) athlonkmf iRex Developer's Corner 0 06-02-2007 11:39 AM


All times are GMT -4. The time now is 04:18 PM.


MobileRead.com is a privately owned, operated and funded community.