Folks, wonderful discussion. I am scanning and converting books for 10 years and have personally converted several hundred books approaching a 1000 works. We have used everything from simple flatbed scanners to two camera rigs to the most sophisticated automatic page turn robots. We do this for a living because we publish and sell these ebooks. Of course we first get the ok from the author or publisher to do so - no copyright violations here at Lybrary.com.
Having done this for a long time I can share a couple of insights and tips:
1) ABBYY is the best OCR software as of today. It has been said here before, but I wanted to stress this. Of course, it is important that you spend time with it to learn all the little features, twists and tricks. I am using the software since its 4.0 version and it has been worth every penny. It really depends on the book you scan how you should use ABBYY, so I can't make any general recommendations except to study all its features and try. When I started I converted the same book 5 times to test various approaches.
2) These days PDF can compress simply scanned and not proof read documents pretty well. When exporting PDF from ABBYY make sure to select 'Enable Mixed Raster Content'. This will bring down the file size by up to 10x. We recently converted a 900 page work (each page is letter sized) and the total size is only 90MBytes, even though each page is a scanned page and we ran OCR without subsequent proof reading. That is merely 100kB for each page - still more than a fully converted text page but not that much more. Another important tip here is to clean up your scanned pages. You want to have a white background as much as you can, no black borders, etc. All of this image content increases the file size.
3) For some who don't want to bother with the scanning you might want to look into scan services. There are companies that scan a page anywhere form 10 cent to 50 cent. In some cases this might be a better option than doing everything yourself.
Here is one thought I am contemplating but I haven't found a good and workable solution for it: It has been stated here that if you buy a book you are free to prepare your own digital version of it. I agree with that interpretation of the copyright law. The next question is what if my friend bought the same book. Can I give him my digital copy of it (remember he has bought the same printed book)? Again my interpretation is yes because my friend has bought the same book and I can certainly share my own work of digitization with him. I couldn't do so with somebody who has not bought the book because that would be in clear violation in copyright law. Does anybody have an opinion on this legal question?
If the answer is yes one can do that without violating copyrights, then it might be possible to pool the resources of people who like to convert their books and share it with others who have the same book. This would eliminate double work. The real problem here is how do you ensure that only those who have bought the printed work have access to the digitized work. I have not yet found a good answer to it. But once I do I would love to create such an exchange platform for converted and copyrighted books.
The only idea I have to make this check if somebody has the book is to ask to mail in the first page or a part of the page. This way it is clear that the person has the physical book. And since the page is now removed nobody else can use this copy to claim that he owns the book. Comments?
|