![]() |
#1 |
Enthusiast
![]() Posts: 30
Karma: 10
Join Date: Nov 2023
Device: Sony PRS-T3
|
book scanning - best practices?
I recently scanned my first book (Fast Food Nation) with a flatbed scanner using naps2. I scanned at 600dpi. Some pages were not perfectly straight, so it doesn't look professional, though I trimmed each image so no shadow is seen from the curved paper. The images were scanned at 600dpi, greyscale (only the cover was scanned in colour). This took me about a day.
Result The PDF is searchable thanks to the built-in OCR in naps2. The PDF generated is about 500MB, which is unreasonable. I was able to reduce the rather larage PDF size to around 50MB with a PDF shrinkage app called Densify. Some quality is lost. I am not sure if I am doing things the correct way. I am probably not, right? Questions * I would like some tips & tricks to make the job easier / quicker. * I would like to make an epub instead of PDF. Best way to go about this? * I would like to be able to extract the OCR text separately instead of only having the OCR'd text searchable in the PDF. I am looking for any tips & tricks that you might be willing to share to make scanning books easier / quicker / more efficient. |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,611
Karma: 9500498
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
Not sure if you want to continue using pdf or convert to epub, but this thread is quite good, and in my post here I detail steps I used to scan a book and create an epub with links to software.
Last edited by Karellen; 01-20-2025 at 01:12 AM. Reason: fix link |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Enthusiast
![]() Posts: 30
Karma: 10
Join Date: Nov 2023
Device: Sony PRS-T3
|
Fantastic, thank you Karellen. I am mostly interested in epub because it displays so much better in my ereader, but I don't mind having both options available to me. I have installed ScanTailor Advanced, tesseract-ocr and gImageReader in Linux & will try them soon.
|
![]() |
![]() |
![]() |
#4 |
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,347
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
I think it was mentioned in the thread Karellen linked, but you can find lots of information on scanning books at DIY Book Scanner. In addition to the software they can help with your hardware setup… some of the more advanced setups boast several hundred pages per hour scanned and processed!
|
![]() |
![]() |
![]() |
#5 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 43
Karma: 14828
Join Date: Feb 2023
Device: Boox Page, Kobo Aura SE
|
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,010
Karma: 105092227
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
No-one needs Adobe Acrobat.
|
![]() |
![]() |
![]() |
#7 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,740
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
|
![]() |
![]() |
![]() |
#8 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 43
Karma: 14828
Join Date: Feb 2023
Device: Boox Page, Kobo Aura SE
|
For reading absolutely it should be avoided, however there is no open source alternative to Adobe ClearScan.
|
![]() |
![]() |
![]() |
#9 | ||
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,347
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Quote:
Quote:
When all is said and done, OCR is still not perfect no matter which software you use. You will still have to read and make manual corrections. The key is getting the clearest scan/image that you can to begin with. Please see the DIY Bookscanner site for in depth discussions about how to get the best scan. Cheers! Last edited by Turtle91; 01-22-2025 at 06:06 PM. |
||
![]() |
![]() |
![]() |
#10 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 516
Karma: 2268308
Join Date: Nov 2015
Device: none
|
|
![]() |
![]() |
![]() |
#11 |
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,347
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
|
![]() |
![]() |
![]() |
#12 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 450
Karma: 3886916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Kobo Forma
|
I made this book scanner years ago out of scrap wood. I have had a variety of lights and cameras on it, including the somewhat ridiculous looking LED floodlight and old video camera in the picture. But it does the job...I can comfortably scan a page about every 10 seconds.
The V-tray and glass on top of the book keeps it nice and flat...no need to correct for curl or keystoneing or whatever. Resolution is totally up to how I set the camera. 300dpi is usually fine for tesseract OCR. OCRFeeder is the tesseract front-end I use. I always OCR page-by page to handle things like double or triple columns, advertisements, "continued on page 107" and so on. Also if there is a real scan/OCR problem, I discover it ON THAT PAGE, not later, buried somewhere in 100,000 words. This gives me jpg images directly, no need to mess with PDF nonsense. I do use ScanTaylor sometimes if the original physical book is horrible. OCR the images, text into Writer for proofing and styling, straight to epub with Sigil or Calibre. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
What is the best book scanning service? | norweger | Workshop | 15 | 05-13-2021 11:07 AM |
Book Scanning | Lordblacknail | Workshop | 1 | 10-13-2010 06:04 PM |
How do you keep your sanity? scanning a book | mypolar | Workshop | 9 | 01-28-2010 08:43 AM |
Digitizing a book best practices | Linus | Workshop | 1 | 07-13-2009 01:00 PM |
Book scanning | kusmi | iRex | 33 | 10-09-2007 05:34 AM |