11-05-2021, 10:49 AM | #1 |
Wizard
Posts: 2,827
Karma: 10700629
Join Date: May 2016
Location: Canada
Device: Onyx Nova
|
Automation of scanning on phone or tablet
I want to scan an old hardback using a phone or tablet, both android. I could use an ipad if absolutely necessary, but I prefer android.
Key point. I want an app that will automatically do a scan every x seconds without user intervention. As such, I will only need to turn the pages, and the app will automatically photo each page at regular intervals. Does such an app exist in android? If not, does it exist on an ipad? If you have any other tips, I'm all ears. I am thinking of building some kind of frame to cradle the book, and to place the tablet over the book. |
11-05-2021, 01:25 PM | #2 |
A Hairy Wizard
Posts: 3,094
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
https://www.diybookscanner.org/
They have all the answers to your questions: scanners, software, cameras, etc. |
11-05-2021, 04:05 PM | #3 |
Wizard
Posts: 2,827
Karma: 10700629
Join Date: May 2016
Location: Canada
Device: Onyx Nova
|
Many thanks.
|
11-05-2021, 08:08 PM | #4 |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
anonlivros wrote a great "Photographs->Ebook" tutorial earlier this year:
Tutorial-from Paper Book to Ebook PDF - 400 pages in 4 hours His method used a bendable "gooseneck mount" + cellphone. On the phone, you can set it to take photos based on a noise. You can then easily: - turn page - make noise - turn page - [...] until your photographs are complete. From there, you can do the usual Raw-Images -> Cropping/Cleanup -> OCR. Within that MR thread, I also described in detail how to use ScanTailor Advanced for that middle stage. Last edited by Tex2002ans; 11-05-2021 at 08:11 PM. |
11-06-2021, 10:12 AM | #5 |
Wizard
Posts: 2,827
Karma: 10700629
Join Date: May 2016
Location: Canada
Device: Onyx Nova
|
Thanks. Unfortunately the author's portugese tutorial is a pdf stored as a file on github, and as such, google translate cannot translate it. Perhaps I am missing something. I've downloaded it and uploaded it to google docs, but still no translation option. I'll try and work it out.
From gleaning, the resulting posts, it seems the OP uses voice activated software to scan, which is a great idea, and probably superior to automating the scans at regular intervals. I will look through the Portugese source to work out what app he is using (he's using a phone, so presumably he's using an app). |
11-06-2021, 10:54 AM | #6 | |
Wizard
Posts: 2,827
Karma: 10700629
Join Date: May 2016
Location: Canada
Device: Onyx Nova
|
Thank you again for the link. It contains more information than I knew I needed.
The guide includes information on how to trigger by sound and by timer. Doubtless there is more essential information in there that I will parse through. For reference to others, here are the translated sections: Quote:
|
|
11-06-2021, 07:36 PM | #7 | ||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
It's all pure text. Right here: https://github.com/anonlivros/anonlivros You can toss that into Google Translate (or DeepL, or Bing Translate, or whatever). It should translate the original Portuguese -> English for you. Quote:
You'll also want to peruse the diybookscanner forums, just like Turtle91 recommended. That site has lots of different designs+builds+tips as well. Like one enhancement you may want is a V-shaped mount to hold the book + plexiglass in order to press pages flat. This will help keep pages perfectly straight, so you don't have to deal with wavy/bent text. (This can be somewhat corrected in software, but it's better to not introduce those errors in the first place.) Last edited by Tex2002ans; 11-06-2021 at 07:40 PM. |
||
11-07-2021, 11:11 AM | #8 |
Addict
Posts: 387
Karma: 1638210
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
|
Here is my DIY scanner effort from about 8 years ago. Made from scrap wood. I've used many different lights and cameras over the years. This current arrangement shows an LED floodlight and an ancient video camera (materials on hand...) and it does a great job. Obviously I use the video camera in single frame mode! The center "V" tray has two pieces of glass (from dollar store picture frames) to keep the pages flat, and that is really important for OCR.
The reason for the floodlight: with cheap old cameras I found lots of light (~high f-stop) improves depth of field and focus. |
11-08-2021, 07:16 AM | #9 |
Junior Member
Posts: 5
Karma: 40142
Join Date: Apr 2021
Device: none
|
Additional tutorial comments
Hi! I am the creator of the tutorial mentioned.
I'm glad the tutorial continues to help more people. Thanks for share, Tex! Tips: 1) About the app used to take photos, over time I found a better one. The app is called 'Open Camera' https://play.google.com/store/apps/d...hl=pt_BR&gl=US He is better because: - Allows you to generate images with minimal compression, which increases quality (more definition in the outline of the letters). Provides higher quality images taken with other apps I've tested. - Allows you to lock the camera's Focus, which eliminates the risk of blurry photos. This risk exists when the 'auto focus adjustment' option is enabled, as during the act of turning the page, the camera may try to readjust the focus and the photograph may be taken during this adjustment, resulting in blurry photography. 2) For sound trigger, I suggest whistling low, like the chirping of a chick. It's more practical than speaking a syllable. 3) I also strongly recommend using Scantailor for dewarp and crop, generating B&W images. This step must be prior to abbyy finereader. 4) I also recommend equalizing the size of page content. When taking left/right photographs, rather than both pages simultaneously, depending on book thickness, there will be a page size distortion of up to 10% between consecutive pages. The distortion is most severe at the beginning and end of the book, being milder as it gets closer to the middle pages. This natural distortion is caused by the difference in distance between the page and the camera. Some ebook producers don't mind this size distortion, as their ultimate goal is to produce an Epub. (Like Tex hehe) But if you care to eliminate visual distortion, producing a PDF with better aesthetics, I've produced a little python script that fixes them from just 4 pieces of information. Check the post: Page Size Equalizer-To single-page capture camera scan I know, I know... The tutorial needs to be updated. I've learned some interesting things over the last few months, especially from sharing experience with Tex. (Which I reserve a gratitude that I couldn't properly express in this post) I hope your digitization projects help a lot of people. Keep sharing. |
11-08-2021, 12:09 PM | #10 | |
the rook, bossing Never.
Posts: 11,150
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
I use Open Camera. Quality ultimately depends on the camera.
BUT my 2002 vintage flatbed scanner with optional ADF is close to 30 M pixels at 600 dpi for a larger book. The advantage of a camera is a V shaped holder to avoid damaging the spine. Pirates and professional scanning of cheap common books cut off the spine and use a duplex ADF. Obviously if the camera has a good lens, the page size is A5 or smaller or a V support is needed then the camera wins on DPI if it's a decent resolution. Very even lighting is needed. Skinny florescent tubes and a diffuser as on older light boxes are good. LED sources may need to be further away. Glass or plastic to hold pages flat is needed if not using a scanner. All the problems solved nearly 50 years ago! Quote:
In 2000 a really good copy typist rather than Scan + OCR + human proof read/edit was still cheaper a for a quality transcription. Such people are now rare. I stopped printing to paper or proofing on PC screen nearly seven years ago. I make an ebook and proof correct on that (was Kindle now Kobo) and read back the annotations. |
|
11-08-2021, 11:40 PM | #11 | ||||||||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Quote:
And the steps would go:
This would get you digitized PDFs with text backend, which would allow you to search/copy/paste, etc. From there, you can do the usual PDF->EPUB steps. (Which are their own in-depth workflows.) Quote:
I didn't check on the Github to see if it was expanded since. And we definitely have to meetup and talk again before the end of the year! Quote:
Non-Destructive->Destructive
High->Low Quality
Fast->Slow Speed
High->Low Labor
Non-Destructive vs. Destructive If the book must stay in-tact, then Scanner or Camera. If you don't mind destroying the book, then cutting the spine off + feeding it into the scanner as a stack of paper saves tons of time+labor. * Note: If you have very fragile/large books, the Scanner may be too rough on the spine, so your only non-destructive choice is Camera + V-shaped holder. Quality The most important, because every later stage depends on this. Remember, digitizing books is an entire process, and getting pictures of pages is just Stage 1. High quality input:
Low quality images:
You may even have to redo a lot of your work when you stumble across a non-recoverable error later on. (Like a photograph/chart/graph being distorted beyond fixing... or horrible speckling that only appears when you try to B&W your image.) * Note: 20+ years ago, most cameras were still too low DPI. The images may have been readable to a human, but feeding that lower-quality image through OCR, you'd get a much higher error rate. Within the past 10 years though, the quality of your typical cellphone camera has dramatically improved. Now, everyone carries something in their pockets that may work "well enough". This is where anonlivros's gooseneck+cellphone method comes into play. It's an extremely cheap (<$50) way of reusing materials you most likely already have (cellphone + lamp)... getting you 80%+ of the way there. * * * Recommendation: If doing conversion professionally though, I'd recommend a superb quality scanning company like Golden Images. He was recommended by Hitch+me quite a few times over the years. The immaculate quality scans will save you all that time+labor in the long-run. Higher quality input = better+faster conversion with less errors. * * * Speed Cameras, when you get the workflow down, can take a few seconds per page. Scanners, at higher DPI, are very slow. Labor Scanners+Cameras require you to turn pages, make sure everything is lined up, hold books down, no fingers in the way, etc. Feed scanners, you stick in a stack of paper and can go off doing something else. Quote:
A lot of this was discussed a few months ago in: 2021: "Archive.org ePub" I even showed the difference between the auto-generated Archive.org "EPUB"s vs. EPUBs generated out right out of Finereader with minimal intervention. Note: Although again, you can rapidly get 80%+ of the way there in your PDF->EPUB, but that final quality push is what takes up the majority of time:
99.99% text accuracy also seems fantastic... until you actually read a book with .01% errors in it. (At least a few errors on every page.) Quote:
There was even an article on that court case a few days ago: Techdirt.com: "Internet Archive Would Like To Know What The Association Of American Publishers Is Hiding" and a few days before that, on the ridiculousness of "library ebook licenses": Techdirt.com: "Publishers Want To Make Ebooks More Expensive And Harder To Lend For Libraries; Ron Wyden And Anna Eshoo Have Questions" Quote:
Libraries already use these methods + Interlibrary Loans to loan scans of their own works (especially fragile/rare/non-public-facing material). This allows you to consolidate it, instead of every individual library having to (poorly) re-scan + re-digitize. Similarly with the Blind/Low-Vision/Disabilities associations + universities within each country all independently digitizing. Why? Why waste all that time and effort when you can do it once, in high-quality PDFs/EPUBs, then lend it from there? On universities digitizing, see the great webinar:
On Copyright/Libraries + "Permission Culture" see:
Side Note: Luckily, Internet Archive has been archiving scholarly articles as well: And speaking of digitizing + Link Rot... ~18% websites cited within scholarly articles are already dead: This knowledge needs to be preserved + easily searchable/accessible. Last edited by Tex2002ans; 11-09-2021 at 06:09 PM. |
||||||||
11-09-2021, 07:58 AM | #12 |
the rook, bossing Never.
Posts: 11,150
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Library schemes, fees per book loaned and permissions vary by country and by media type. Not everyone is in the USA. USA laws do not apply outside the USA. Sadly eBooks are rarely by the same rules as paper.
Also in most countries the Copyright Holder has the right to decide how or if copies are done. Not USA Courts or Google or the Internet Archive, no matter how virtuous they are at preserving information. From the point of view of Irish and UK law both Google and the Internet Archive have been breaking copyright (civil issue, not a criminal) for over a decade by scanning British and Irish works still in copyright. Also Google invented their own definition of "Orphan Works". The big USA Corporations, OTOH, are responsible for the disgraceful extensions of Life + term and Corporate Copyrights, DRM and DMCA. All immoral corporate "theft" that does nothing for creators and immediate children. |
11-09-2021, 07:39 PM | #13 | ||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
You may also be interested in an upcoming debate (November 15, 2021) at the Soho Forum: https://www.thesohoforum.org/ Quote:
(I'm for digitizing and freeing up any and all works.) * * * Last night, I was also digging through the ol' podcasts: which also discussed the Public Domain 2019 celebration: Full video of the conference here: Keynote given by Creative Commons founder (Lawrence Lessig). And in the Techdirt podcast, they mentioned a fantastic talk by a blind person who explains the entire process from her point of view. (Lightning Talks began in part #2 of the conference at 3:06:00. Her talk started a few minutes later.) |
||
11-10-2021, 09:18 AM | #14 |
the rook, bossing Never.
Posts: 11,150
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
I don't agree with abolishing patents, design patents (UK Registered Designs) or copyright. That's anarchy. But the USPTO is broken since the Victorian era and instead of reform it's worse. They don't do due diligence. They make more money from approval than rejection. The theory is that the courts should decide validity. That's weighted toward US companies and also those with deep pockets.
Creative Commons is not a solution. Anyone can have whatever relaxation of copyright they want, CC is just one of many templates often incorrectly applied. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Plug-In Proposal: Phone/tablet barcode/isbn scanning | Philosopher | Plugins | 8 | 10-07-2015 11:40 AM |
Should I get phone connectivity option on my tablet? | Paul Miller | Which one should I buy? | 6 | 08-15-2014 10:27 AM |
DRM and tablet/phone only books | Nightyume | Kobo Reader | 13 | 11-13-2013 07:21 PM |
What does resolution mean for your tablet/phone/ereader? | jswinden | General Discussions | 13 | 11-06-2012 08:18 PM |
Tablet/Phone Combo? | Granny123 | Introduce Yourself | 2 | 11-28-2011 08:19 AM |