Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 04-10-2009, 03:33 PM   #1
maynard
Connoisseur
maynard began at the beginning.
 
Posts: 78
Karma: 10
Join Date: Apr 2009
Device: iPhone, PRS-505, iLiad, Entourage Edge
From physical to digital

I have a somewhat large library of real books. They're a real PITA to carry around, and even worse to ship for a household move. So, I'd like to consider scanning in my book collection for use on an ereader. Some questions:

- Do most people just scan to .jpg and format for the screen?

- If so, about how large of an image file is each page?

- What tools do you use to batch process the files?

- auto crop and rendering to for a specific device's resolution


Or are folks using OCR to convert to text?

- Do you notice significant errors creeping into the text?

- What do you do to fix those errors?

- How do you find obvious text errors without manual editing?

BTW: I just bought a Sony PRS 505, but I expect to buy one of the larger full page units as soon as they hit the market (or the wireless iRex unit with a functional firmware hits).
maynard is offline   Reply With Quote
Old 04-10-2009, 04:18 PM   #2
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
Quote:
Originally Posted by maynard View Post
- Do most people just scan to .jpg and format for the screen?
Don't know about most people, but I OCR the scans and take pains to proofread and correct them.

Quote:
- If so, about how large of an image file is each page?
In 300DPI PNG, which I use, something between 1.5 and 3 MB per page, depending on complexity. Much less with JPEG, obviously.

Quote:
- Do you notice significant errors creeping into the text?
Way too many, yes.

Quote:
- What do you do to fix those errors?
They need to be hand-fixed.

Quote:
- How do you find obvious text errors without manual editing?
A few typical errors can be found and fixed with regular expressions, but mosty of them require manual approach.
pepak is offline   Reply With Quote
Advert
Old 04-10-2009, 04:46 PM   #3
Steven Lyle Jordan
Grand Sorcerer
Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.
 
Steven Lyle Jordan's Avatar
 
Posts: 8,478
Karma: 5171130
Join Date: Jan 2006
Device: none
Guys, a method I've found to improve the quality of the OCR process involves photocopying of the book's pages first, expanding the page image up to letter/A4 size. Then scan those letter-sized pages into an OCR scanner. Many of the better OCR scanners can allow standard-sized paper to be fed into them and read at high-speed, removing the need to manually scan each page (though you'll still end up doing that at the earlier copier stage). And the expanded letters will be easier for the OCR program to read, resulting in fewer errors.

Personally, I feel the 2-step photocopy-scan process is worth the creation of scanned pages with fewer errors.

Occasionally, you luck out and discover that a particular error happens regularly, and you can fix it with a "find-and-replace all" process. But you should still go through every page manually.

Last edited by Steven Lyle Jordan; 04-10-2009 at 04:49 PM.
Steven Lyle Jordan is offline   Reply With Quote
Old 04-10-2009, 04:52 PM   #4
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
Quote:
Originally Posted by Steve Jordan View Post
Guys, a method I've found to improve the quality of the OCR process involves photocopying of the book's pages first, expanding the page image up to letter/A4 size.
You could just as well scan in higher resolution, e.g. 600 DPI.

Quote:
Many of the better OCR scanners can allow standard-sized paper to be fed into them and read at high-speed, removing the need to manually scan each page
Actually, the scanning process is a lot less demanding than I originally thought. With Plustek OpticBook 3600, I am doing slightly more than 100 pages every 20 minutes, getting a normal-sized book completely scanned in about an hour.
pepak is offline   Reply With Quote
Old 04-10-2009, 05:42 PM   #5
maynard
Connoisseur
maynard began at the beginning.
 
Posts: 78
Karma: 10
Join Date: Apr 2009
Device: iPhone, PRS-505, iLiad, Entourage Edge
Do you cut the book up with a razor and then feed the pages through the feeder? I have a Brother multifunction unit with a feeder that supports double-sided scanning. Or are you doing everything possible to save the original copy and binding?
maynard is offline   Reply With Quote
Advert
Old 04-11-2009, 12:23 AM   #6
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
With OpticBook, there is no need to damage the book.
pepak is offline   Reply With Quote
Old 04-11-2009, 10:44 AM   #7
Steven Lyle Jordan
Grand Sorcerer
Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.
 
Steven Lyle Jordan's Avatar
 
Posts: 8,478
Karma: 5171130
Join Date: Jan 2006
Device: none
Quote:
Originally Posted by pepak View Post
Actually, the scanning process is a lot less demanding than I originally thought. With Plustek OpticBook 3600, I am doing slightly more than 100 pages every 20 minutes, getting a normal-sized book completely scanned in about an hour.
Professional autofeed scanners (the kind used by document printers like the Xerox DocuTech and similar platforms, found in many on-demand print shops like Kinko's) can scan 100 pages in 2 minutes or less, and with the right OCR software, generate scanned images of those in under 10 minutes. (Not that the Optibook rate is bad, I'm just saying there are faster methods that also work well.)

Usually, the only catch is getting shops that have this equipment to allow you to use it, as they tend to assume your scanning a book to do an OCR is probably illegal...
Steven Lyle Jordan is offline   Reply With Quote
Old 04-11-2009, 12:51 PM   #8
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
Quote:
Originally Posted by Steve Jordan View Post
(Not that the Optibook rate is bad, I'm just saying there are faster methods that also work well.)
They are, but
1) I can't afford the non-destructive ones, and
2) I don't want to use the destructive ones.
pepak is offline   Reply With Quote
Old 04-11-2009, 03:27 PM   #9
Steven Lyle Jordan
Grand Sorcerer
Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.Steven Lyle Jordan ought to be getting tired of karma fortunes by now.
 
Steven Lyle Jordan's Avatar
 
Posts: 8,478
Karma: 5171130
Join Date: Jan 2006
Device: none
Quote:
Originally Posted by pepak View Post
They are, but
1) I can't afford the non-destructive ones, and
2) I don't want to use the destructive ones.
Right. But remember, I said to photocopy the pages first, which does not have to be destructive... and run the photocopied pages through the scanner. Trust me, I have done this, and it's not so bad, and not that hard on the book (provided it doesn't have a flimsy spine).
Steven Lyle Jordan is offline   Reply With Quote
Old 04-12-2009, 11:50 AM   #10
chumbucket
Member
chumbucket began at the beginning.
 
Posts: 11
Karma: 16
Join Date: Aug 2008
Device: Sony PRS-505, Sony PRS-T1
I'm torn as to which method to use to convert some books so that I can read them on my Sony 505. Should I plunk down the cash for a Opticbook scanner or rig up a homebrew setup to take pictures of the pages? I want to save money but I also want the fastest method. I was contemplating getting a new camera soon anyways?
What is the width of the lip on the Opticbook scanner anyways?
chumbucket is offline   Reply With Quote
Old 04-12-2009, 11:59 AM   #11
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
Quote:
Originally Posted by chumbucket View Post
What is the width of the lip on the Opticbook scanner anyways?
Not sure what you mean by "lip", but I assume it to mean "how much space do I need between the spine and the start of the text". With OpticBook, you need at least 6 milimeters or so, with 8-10 milimeters being comfortable enough. It depends on the tightness of the book, of course, and I am having much easier time with hardcovers than with paperbacks.
pepak is offline   Reply With Quote
Old 04-12-2009, 01:39 PM   #12
chumbucket
Member
chumbucket began at the beginning.
 
Posts: 11
Karma: 16
Join Date: Aug 2008
Device: Sony PRS-505, Sony PRS-T1
Quote:
Originally Posted by pepak View Post
Not sure what you mean by "lip", but I assume it to mean "how much space do I need between the spine and the start of the text". With OpticBook, you need at least 6 milimeters or so, with 8-10 milimeters being comfortable enough. It depends on the tightness of the book, of course, and I am having much easier time with hardcovers than with paperbacks.
Do you find that you have to hold the book down to get it to stop curling up? Is that why you say that hardcovers work better. How does the Opticbook scanner work with Abbyy Finereader? How long would you say it takes to scan say 100 pages or so?

Thanks!
chumbucket is offline   Reply With Quote
Old 04-12-2009, 02:28 PM   #13
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
Quote:
Originally Posted by chumbucket View Post
Do you find that you have to hold the book down to get it to stop curling up?
I probably wouldn't have to hold it, but I get better results if I do.

Quote:
Is that why you say that hardcovers work better.
Mostly because hardcovers are a lot more tolerant to full opening without getting damaged. Paperbacks, I tend to read while opened less than 90 degrees to preserve the spine.

Quote:
How does the Opticbook scanner work with Abbyy Finereader?
I have never tried scanning from inside FineReader (*), but when I scanned in the application provided with the scanner and then imported the images into FineReader, it worked just fine.

*) The scanner and its software were apparently created by someone who thought long and hard about how it would be easiest to scan books, and it shows - it really is very comfortable and easy to do. Most of my scanning is done without looking at the screen, with just occasional glances to make sure everything is still fine.

Quote:
How long would you say it takes to scan say 100 pages or so?
It really depends on the book, but I average 100 pages every 20 minutes (give or take a minute).
pepak is offline   Reply With Quote
Old 04-12-2009, 03:24 PM   #14
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,514
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by pepak View Post
Mostly because hardcovers are a lot more tolerant to full opening without getting damaged. Paperbacks, I tend to read while opened less than 90 degrees to preserve the spine.
It's a matter of which kind of binding they have. Hardcovers are usually sewn, while paperbacks are just glued (or whatever the right terms are).
Jellby is offline   Reply With Quote
Reply

Tags
pbooks, scan

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Physical eBooks Ben Thornton News 42 04-10-2014 11:46 AM
Is there a physical presence in NYC? vinvin PocketBook 8 10-08-2010 04:08 PM
Physical books to come with digital copies aswell? EtherealWinter General Discussions 2 05-22-2010 02:16 AM
jetbook lite physical size kellie Ectaco jetBook 24 03-17-2010 07:02 PM
Photo of your (physical) bookshelf ... (?) AnotherMichael Introduce Yourself 4 07-27-2009 09:31 AM


All times are GMT -4. The time now is 01:51 PM.


MobileRead.com is a privately owned, operated and funded community.