Thread: Scanning books
View Single Post
Old 12-19-2022, 04:56 PM   #9
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Karellen View Post
Oh wow. I just spotted this... https://imgur.com/gallery/8WvxKsm

I had always assumed the books were torn apart and fed through a sheet feeder.
There's 4 types:
  • Destructive + Non-Destructive
  • Scanning + Photographing

Destructive is way cheaper + faster, but you lose the book. You cut the binding off + feed it into a sheet feeder.

Non-Destructive is more expensive + slower, but the book stays whole. You flip the pages, taking a photo (or scan) of each page.

Scanning is slower, like the video you linked to. This will flatten the pages + try to create a very good scan. You don't need fancy lighting setups or anything, because the scanner itself makes sure there's "consistent lighting" across the entire page.

Photographing is faster, but you need to do a lot more error-correcting later. (Warped pages, bad lighting, etc.)

B&W vs. Color adds a whole other issue. (Color is harder+slower.)

- - -

There are various cost-levels with each of these categories.

What you linked to is probably a very expensive machine, but it means someone doesn't have to be standing there flipping the pages or pushing buttons.

But, things can be done at decent quality very cheaply nowadays, especially with that fancy phone in your pocket.

See anonlivros's great tutorial:

and my post-processing advice to turn it into a PDF->Ebook.

He shows you how you could use a:
  • Smartphone
  • Gooseneck holder
  • Lamp/Light

to help take photographs of your books.

Yes, you'll still have to flip the pages, but he gives quite a few tips on how to speed up the process (like using your voice to activate the camera!).

- - -

Quote:
Originally Posted by bcbob View Post
Seems like it could easily pull pages out of the binding. Here’s an article about how Google does it using cameras and image processing software to undo the distortion.

https://www.npr.org/sections/library...t_7508978.html
Archive.org also has fantastic articles/videos showing off their process.

and, if you want to see them digitizing vinyl albums:

Quote:
Originally Posted by tomsem View Post
I bought an overhead scanner, mostly to digitize sheet music. It doesn't actually 'scan', rather captures image with a camera. [...]

I have yet to use it for a real project, but I think it's better than anything out there for under $500. My only worry is that there is no 3rd party software that works with it:
Yes, that's the issue with a lot of those all-in-one solutions. They try to tie you to their specific software.

This type of lock-in is completely unneeded, especially nowadays.

With high-quality cameras in cellphones now, you can replicate a lot of this stuff. (See the anonlivros's tutorial above.)

Quote:
Originally Posted by tomsem View Post
It has multiple LED lights to eliminate shadows and they sell a fold-up light box that goes over everything, with interior reflective surface to further even out the light and ensure the best results.
Yes, "even lighting" is one of the major issues with camera-based systems.

To a human, things may look fine... but when you try to convert color photo->B&W, you could get some serious "haloing".

I don't have an image on hand of haloing, but you can see a related concept of the previous-page bleeding through to the front in my recent post:

In the "Minimize the Colors" section, you can see:
  • Photograph
    • Left = Original photo + Right = Color corrected.
  • Dark Mode
    • In Inverted/Dark Mode, you can see the "grayish haze" much easier on the left.

That's one issue where a high-quality scanner may take care of it automatically. It may auto-adjust the lighting to try to capture ONLY the front page, while ignoring the light text bleeding through.

While a human can look at the photo + easily "ignore" that, and still be able to read the text... the computer might have a hard time doing OCR or making the photos B&W. (Leading to less accurate text and/or you'll get extremely bloated filesizes.)

Quote:
Originally Posted by andyh2000 View Post
PS This used to be the go to place for book scanning but it looks like the web site is fast succumbing to bit rot: https://www.diybookscanner.org/
Yes, that was/is THE place to learn a lot of the book digitizing information.

Not much has changed in all those years though.

The same problems still exist:
  • turning pages
  • flattening pages
  • Making sure camera doesn't move
    • And is facing the book perfectly straight-on.
  • Consistent lighting
  • Dewarping pages
  • [...]

But the surrounding tools have gotten much cheaper/faster/better.

Like back then, you needed expensive cameras to get "high enough DPI" for good OCR. Now, a cellphone in your pocket can take just as good (if not better) photos.

Phones also have voice-activation, so you don't need those extra buttons/foot-pedals.

Phones also have gyroscopes, so you can more easily align your photos so you don't need to do as much trapezoidal correction.

Phones have fancy screens, so you don't need to be hooked up to a computer + can instantly see the photos and correct there, etc., etc.

Last edited by Tex2002ans; 12-19-2022 at 05:16 PM.
Tex2002ans is offline   Reply With Quote