Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 06-14-2006, 10:58 PM   #1
Charles Gray
Member
Charles Gray began at the beginning.
 
Posts: 10
Karma: 30
Join Date: Jan 2006
Scanning paper (out of copyright) books.

I have many, MANY books-- some of them are out of copyright, and for others I was able to get permission to ebook them so long as it isn't distriubted and the original copy is destroyed.
But that leaves the question of how do I do it? Flatbed scanners seem desructive and although I have a very good OCR program (Abby fine reader), the "lift" in the spine seems to cause problems. That's not a problme for the "Scan and destroy" books, but my out of copyright pulps from the 1920's are a different matter. (and rather important, as I'd like to read them, but too much reading will also destroy them). I didn't see any other place here to ask this question, so I was wondering if I could recieve any help.
Charles Gray is offline   Reply With Quote
Old 06-16-2006, 02:38 AM   #2
ath
Addict
ath doesn't litterath doesn't litter
 
Posts: 222
Karma: 110
Join Date: Jun 2006
Location: Malmo, Sweden
Device: iLiad, Sony PRS-505, Kindle
Quote:
Originally Posted by Charles Gray
But that leaves the question of how do I do it? Flatbed scanners seem desructive and although I have a very good OCR program (Abby fine reader), the "lift" in the spine seems to cause problems.
Unless you have access to an overhead scanner, scanning is very probably going to be destructive to some extent.

Scanning books quickly means, unfortunately, cutting them up, and running them through a page-fed scanner.

You can scan page spreads with a flat-bed scanner, but it will stress the spine and the hinges of the book in a way that doesn't happen with ordinary reading. I've done several late 19th century books on a largish flatbed, and if the books don't break up entirely, the back cover is usually ripped afterwards, and some of the sections are starting. There is also some risk of ripping or folding a page due to clumsy handling.

There are scanners where the scanning area extends to the edge of the device (see Plustek OpticBook 3600, or the 3600 Plus if you're going for PDF -- and I think Xerox has/had a similar scanner). This lessens the stress on the spine, but it doubles the effort and time, as well as doubles the risk of damaging the page.

I know of some experiments with a camera (a digital camera is a kind of overhead scanner, and with a film camera you can often get decent scans made from the film), but it definitely requires more than just point-and-click. You will at least need some kind of good camera stand, as well as good, even lighting. See project Runeberg for more info.

Last edited by ath; 06-16-2006 at 03:19 AM.
ath is offline   Reply With Quote
Old 06-16-2006, 02:46 AM   #3
tribble
iLiad Maniac
tribble knows what time it istribble knows what time it istribble knows what time it istribble knows what time it istribble knows what time it istribble knows what time it istribble knows what time it istribble knows what time it istribble knows what time it istribble knows what time it istribble knows what time it is
 
tribble's Avatar
 
Posts: 1,382
Karma: 2369
Join Date: Apr 2006
Location: Germany
Device: Bookeen Opus (i love that thing) and iPad (what an irony)
What about taking photos in highres of the pages, like the professional bookscanners do. Then do a batch transform of your image to change the pages, that the distortion gets removed. then do the OCR.
tribble is offline   Reply With Quote
Old 06-22-2006, 08:16 AM   #4
DTM
Intentionally Left Blank
DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.
 
DTM's Avatar
 
Posts: 171
Karma: 300106
Join Date: Feb 2006
Location: Royal Oak, MI, USA
Device: Nook STR
I'm sure you could get some help in the forum at the Distributed Proofreaders website. You may even want to run your projects through them, getting you an entire network of proofreaders.

Check it out at: www.pgdp.net
DTM is offline   Reply With Quote
Old 09-28-2007, 08:34 AM   #5
ereszet
Zealot
ereszet has a complete set of Star Wars action figures.ereszet has a complete set of Star Wars action figures.ereszet has a complete set of Star Wars action figures.ereszet has a complete set of Star Wars action figures.
 
ereszet's Avatar
 
Posts: 118
Karma: 306
Join Date: Sep 2007
Device: Sony PRS-500 Archos 704 wifi
From Paper to Digital Books

See my thread "do-it yourself repro v-cradle for paper books" in Reader Accessories
ereszet is offline   Reply With Quote
Old 09-28-2007, 09:44 AM   #6
RWood
Technogeezer
RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.
 
RWood's Avatar
 
Posts: 7,233
Karma: 1596436
Join Date: Nov 2006
Location: Virginia, USA
Device: Sony PRS-500
There was a thread by Bob Russell about a scanner that was designed for bound books and had them over the corner of the scanner so a page would lie flat. It seemed to work well. I will look again for the article.
RWood is offline   Reply With Quote
Old 09-28-2007, 12:45 PM   #7
ricdiogo
Gutenberger
ricdiogo will become famous soon enoughricdiogo will become famous soon enoughricdiogo will become famous soon enoughricdiogo will become famous soon enoughricdiogo will become famous soon enoughricdiogo will become famous soon enoughricdiogo will become famous soon enough
 
ricdiogo's Avatar
 
Posts: 142
Karma: 700
Join Date: Jul 2007
Location: Lisbon, Portugal
Device: Cybook Gen 3
Quote:
Originally Posted by DTM View Post
I'm sure you could get some help in the forum at the Distributed Proofreaders website. You may even want to run your projects through them, getting you an entire network of proofreaders.

Check it out at: www.pgdp.net
Charles Gray, DTM has given you a great advise. You would also be contributing for having more public domain ebooks freely available online at Project Gutenberg.

I also suggest you to read Project Gutenberg's Scanning FAQ.
ricdiogo is offline   Reply With Quote
Old 10-19-2007, 06:30 PM   #8
Studio717
Addict
Studio717 will become famous soon enoughStudio717 will become famous soon enoughStudio717 will become famous soon enoughStudio717 will become famous soon enoughStudio717 will become famous soon enoughStudio717 will become famous soon enough
 
Posts: 208
Karma: 575
Join Date: Oct 2006
Location: California
Device: Various Kindles, iPhone, iPad, Galaxy 10.1
Quote:
Originally Posted by RWood View Post
There was a thread by Bob Russell about a scanner that was designed for bound books and had them over the corner of the scanner so a page would lie flat. It seemed to work well. I will look again for the article.
This is the Opticbook 3600. I have one and it does a great job with scanning. The edge of the glass is almost at the very edge of the scanner, so except for too-tightly bound (or usually, ime, rebound) books, it does a beautiful job of capturing all the text.

Any flatbed scanner is going to take longer to scan than an overhead setup like ereszet's (which is a setup I'm trying to recreate myself for a large book I have), but the Opticbook is the best out there as far as I've found for a low-cost flatbed solution.
Studio717 is offline   Reply With Quote
Old 01-22-2009, 03:17 PM   #9
latchkeyed
Member
latchkeyed began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Jun 2008
Device: Cybook Gen3
Alternatative photographing options

You can also check out what we're doing at http://bkrpr.org. We have instructions for putting together a camera mount using cheap consumer digital cameras and a v book cradle like ereszet's. Actually I need to look into his version, mine is pretty much cobbled together wood.
latchkeyed is offline   Reply With Quote
Old 02-26-2009, 05:01 PM   #10
glenn cornish
Member
glenn cornish began at the beginning.
 
Posts: 16
Karma: 26
Join Date: Dec 2008
Device: Sony e-book
have I seen hand held scanners which you can run over the page? If so, it would be slow, but non-destructive.

Glenn Cornish
glenn cornish is offline   Reply With Quote
Old 03-18-2009, 06:52 PM   #11
Prospect
Other
Prospect will become famous soon enoughProspect will become famous soon enoughProspect will become famous soon enoughProspect will become famous soon enoughProspect will become famous soon enoughProspect will become famous soon enough
 
Posts: 142
Karma: 644
Join Date: Jan 2008
Location: Norway
Device: Cybook, Kindle
Any one have any clues to what kind of magic I should ask my favourite image editor to perform in order to reduce the background noise of my scanned pages. I have tried working with saturation, hue, rgb-channels, contrast etc and the result is becomes better than the one straight from the scanner, but not as good as the Google books. My improvements are by change since I am clueless at this. Any one with some general advice on the matter or perhaps a linky?

Of course it would depend on a lot of factors how one should behave oneself to get the best result, but there should probably be some general rules or principles on the matter. (Trying to use Irfanview which has batch processing with advanced options. The point is to get them to my Cybook in one piece without any OCR)
Prospect is offline   Reply With Quote
Old 03-18-2009, 06:54 PM   #12
zelda_pinwheel
zeldinha zippy zeldissima
zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.
 
zelda_pinwheel's Avatar
 
Posts: 27,828
Karma: 908606
Join Date: Dec 2007
Location: Paris, France
Device: eb1150 & is that a nook in her pocket, or she just happy to see you?
Quote:
Originally Posted by Prospect View Post
Any one have any clues to what kind of magic I should ask my favourite image editor to perform in order to reduce the background noise of my scanned pages.
i don't know irfanview but in photoshop i would try adjusting the levels (select the text as black, and a slightly noisy area of the page as white), and contrast.
zelda_pinwheel is offline   Reply With Quote
Old 03-19-2009, 02:57 PM   #13
DDHarriman
Guru
DDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheese
 
Posts: 851
Karma: 1200
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
OpticBook 3600 is the solution: cheap, easy efficient!
DDHarriman is offline   Reply With Quote
Old 03-19-2009, 03:56 PM   #14
AnemicOak
Bookaholic
AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.
 
AnemicOak's Avatar
 
Posts: 10,333
Karma: 28806489
Join Date: Oct 2007
Location: Minnesota
Device: HDX 8.9, AuraHD, Nook HD+, Kindle 2,3,T , Opus, Nexus7, iPhone5, etc
Quote:
Originally Posted by Prospect View Post
Any one have any clues to what kind of magic I should ask my favourite image editor to perform in order to reduce the background noise of my scanned pages. I have tried working with saturation, hue, rgb-channels, contrast etc and the result is becomes better than the one straight from the scanner, but not as good as the Google books. My improvements are by change since I am clueless at this. Any one with some general advice on the matter or perhaps a linky?

Of course it would depend on a lot of factors how one should behave oneself to get the best result, but there should probably be some general rules or principles on the matter. (Trying to use Irfanview which has batch processing with advanced options. The point is to get them to my Cybook in one piece without any OCR)

I've never scanned a book before, but do use a scanner many times a day. Did you scan your pages as RGB, Grayscale or Line Art (B&W)? Not sure which would be best, might depend on your source book quality.

Depends on what you mean by background noise, but have you tried messing with the images levels (that's what Photoshop calls it anyway), maybe that's what you meant by rgb-channels. Usually if I have some speckles or something in the background (white) part of a scan I can mess with the levels and get rid of it. Some software has a despeckle option that I'm told can be useful on some books, but it'll also get rid of punctuation a lot of the time.
AnemicOak is online now   Reply With Quote
Old 03-19-2009, 03:58 PM   #15
Elfwreck
Grand Sorcerer
Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.
 
Elfwreck's Avatar
 
Posts: 5,142
Karma: 24387938
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Clié; PRS-505; EZR Pocket Pro, PRS-600, Kobo Mini
In most cases, it's best to scan books as line art. From there, you can play with the brightness & contrast settings (depending on the scanner) to get a better quality scan, and later use Irfanview or something like it if the pages need more editing.
Elfwreck is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Scanning in your own books gazza News 115 12-28-2009 05:32 PM
Scanning books - New need help Sporadic Workshop 9 04-19-2009 01:11 PM
Scanning Tips For Thin Paper Adam B. Workshop 4 12-20-2008 01:39 PM
Interesting paper on copyright law vs reality Nate the great News 9 11-20-2007 02:20 AM
Scanning books Nate the great Lounge 10 11-04-2007 01:20 AM


All times are GMT -4. The time now is 02:22 AM.


MobileRead.com is a privately owned, operated and funded community.