View Single Post
Old 01-21-2023, 01:36 PM   #5
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Last month I described a lot of this book digitizing process in detail:

You pretty much have these basic steps:
  • 1. Scanning / Taking the photos.
  • 2. Normalizing / Cleaning up the images.
  • 3. OCRing / Converting

Each of these has its own tools + enhancements you can do to make things better.

Like Turtle91 said, DIY Book Scanner is where you can learn a lot of info on the scanning side of things. (Like V-shaped plexiglass to press pages down will help you with much less dewarping in the 2nd stage!)

The better input you get in those initial stages helps, because that becomes the basis for ALL FUTURE stages.

(If your original images are crap/warped, this requires much more work in Stage 2 + Stage 3—much more time spent dewarping/correcting, OCR will be much less accurate, etc.)

- - -

Side Note: With sheet music, I'd suspect you REALLY want your papers straight, so that the bars will appear completely horizontal. (It'll be very easy for the dewarping algorithms to make those look wobbly.)

- - -

Quote:
Originally Posted by tomsem View Post
And I don't yet know how OCR compares with other products, including AABBYY, which the scanning software includes.
Finereader is much better.

It can also detect images/text/tables + headers/footers, etc.

(No idea how it would deal with sheet music though. It would most likely get completely confused because of the complicated layouts.)

Quote:
Originally Posted by tomsem View Post
If the target is ePub (or even fixed layout ePub) then Acrobat need not apply.
Music sheets like this will not be creatable as EPUB. It will have to, sadly, stay as PDF.

Quote:
Originally Posted by tomsem View Post
I'm not quite as ready to throw Photoshop under the bus. Importing and exporting layers like this is pretty slow for some reason, but at least it works.
You can use whatever tools you want for whatever stages you want.

Some will bring more misery than others. :P

For my post-processing stage, I prefer using:
  • Scan Tailor Advanced

It:
  • will help you crop/align/resize pages
  • has built-in dewarping/despeckling
  • has multiple color/grayscale -> B&W algorithms
  • [...]

Most importantly, you can easily tweak variables on a per-page basis.

If one page has too many speckles? Raise the strength.

If one page had uneven lighting (or was slightly brighter than the others?)? Well, tweak the B&W strength.

- - -

And, if needed, you can always then toss it into Photoshop afterwards and do whatever extra refinements there.

Quote:
Originally Posted by tomsem View Post
Looking ahead, I'm also planning a screen capture based tool chain, which should more be amendable to scripting.
Perhaps, you can automate some of these steps/stages, but when reality hits, a lot of these pages will require manual tweaks + elbow grease.

Anyway, I'm looking forward to hearing more from you. Always good to learn more about people's image-cleaning routines.
Tex2002ans is offline   Reply With Quote