![]() |
#46 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 303
Karma: 1000702
Join Date: Sep 2009
Location: Chicago
Device: Nook ST, Kindle 2, Samsung Galaxy Stellar phone
|
|
![]() |
![]() |
![]() |
#47 | |
Member
![]() Posts: 11
Karma: 10
Join Date: Feb 2009
Location: Australia
Device: HanLin V3
|
Quote:
![]() |
|
![]() |
![]() |
![]() |
#48 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,187
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
|
There's a better way.
If your pages are the same size and layout, or close to it, you can save the text blocks you use, and load them on all the pages at once. I have FineReader 7 Pro. How I'd do this: -Go to a standard-looking page of your document -Ctrl-E to place zones on the page. Delete unwanted text/image blocks. -Shape wanted text block(s) to just a bit bigger than the main text of the page; give a bit of margin in case of pages that are shifted a bit to one side or the other. -Image-->Save Blocks: save blocks out (usually with the name of the book, so you remember which one it is. -Select all pages in your book (or all besides the cover page & TOC, which may need different zoning) -Image-->Load Blocks; apply to selected pages. This will only work if your pages are substantially identical--but it'll save hours if they are. And it can be done to all pages, and then you can quickly flip through and look for any that need to be zoned differently. |
![]() |
![]() |
![]() |
#49 | |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 162
Karma: 24658
Join Date: Sep 2009
Device: PRS-505
|
Quote:
![]() |
|
![]() |
![]() |
![]() |
#50 |
Member
![]() Posts: 13
Karma: 10
Join Date: Sep 2009
Device: none
|
Folks, wonderful discussion. I am scanning and converting books for 10 years and have personally converted several hundred books approaching a 1000 works. We have used everything from simple flatbed scanners to two camera rigs to the most sophisticated automatic page turn robots. We do this for a living because we publish and sell these ebooks. Of course we first get the ok from the author or publisher to do so - no copyright violations here at Lybrary.com.
Having done this for a long time I can share a couple of insights and tips: 1) ABBYY is the best OCR software as of today. It has been said here before, but I wanted to stress this. Of course, it is important that you spend time with it to learn all the little features, twists and tricks. I am using the software since its 4.0 version and it has been worth every penny. It really depends on the book you scan how you should use ABBYY, so I can't make any general recommendations except to study all its features and try. When I started I converted the same book 5 times to test various approaches. 2) These days PDF can compress simply scanned and not proof read documents pretty well. When exporting PDF from ABBYY make sure to select 'Enable Mixed Raster Content'. This will bring down the file size by up to 10x. We recently converted a 900 page work (each page is letter sized) and the total size is only 90MBytes, even though each page is a scanned page and we ran OCR without subsequent proof reading. That is merely 100kB for each page - still more than a fully converted text page but not that much more. Another important tip here is to clean up your scanned pages. You want to have a white background as much as you can, no black borders, etc. All of this image content increases the file size. 3) For some who don't want to bother with the scanning you might want to look into scan services. There are companies that scan a page anywhere form 10 cent to 50 cent. In some cases this might be a better option than doing everything yourself. Here is one thought I am contemplating but I haven't found a good and workable solution for it: It has been stated here that if you buy a book you are free to prepare your own digital version of it. I agree with that interpretation of the copyright law. The next question is what if my friend bought the same book. Can I give him my digital copy of it (remember he has bought the same printed book)? Again my interpretation is yes because my friend has bought the same book and I can certainly share my own work of digitization with him. I couldn't do so with somebody who has not bought the book because that would be in clear violation in copyright law. Does anybody have an opinion on this legal question? If the answer is yes one can do that without violating copyrights, then it might be possible to pool the resources of people who like to convert their books and share it with others who have the same book. This would eliminate double work. The real problem here is how do you ensure that only those who have bought the printed work have access to the digitized work. I have not yet found a good answer to it. But once I do I would love to create such an exchange platform for converted and copyrighted books. The only idea I have to make this check if somebody has the book is to ask to mail in the first page or a part of the page. This way it is clear that the person has the physical book. And since the page is now removed nobody else can use this copy to claim that he owns the book. Comments? |
![]() |
![]() |
![]() |
#51 | |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 551
Karma: 1121392
Join Date: May 2008
Location: USA
Device: HTC One M8
|
Quote:
|
|
![]() |
![]() |
![]() |
#52 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,187
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
|
Quote:
Opinion: Copyright law is psychotic and takes no notice of common sense and reasonable prevention of unnecessary effort. "Fair use" is not actually defined. It is described as having allowances for educational purposes, and parody, and de minimis use (except in music), and is acknowledged to cover other uses which have been checked against the four factors. There is no equation to use to decide if a particular use is, or is not, acceptable. The practical side of things: If you scan & convert a book for your friend, nobody knows, and nobody cares. If you start a book club with converted versions of the Harry Potter books for "everyone who can prove ownership of the physical copy," which sounds like a very reasonable (which does not mean "legal") format-shifting option, you can bet you'll be facing a lawsuit as fast as Warner Brothers can draft the C&D order. Will it be successful? Pointless question. The real question is: How much time, money & lawyer resources do you have to spend on this? |
|
![]() |
![]() |
![]() |
#53 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 714
Karma: 1014039
Join Date: May 2007
Device: Sony PRS-500, Sony PRS-505, Kindle 3, Sony PRS350, iPad 64GB
|
It's too bad that you can't use the power of the mass for these kind of jobs. Proofreading... if only you can like open a wiki for people to edit the page...
Or even better, attach your scanned files to recaptcha |
![]() |
![]() |
![]() |
#54 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,671
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
|
@lybrary I really like this idea, the sharing of OCR books. I think a good way to validate is just to email a photo with an ID and the book. (Not crazy about having to deface a book just to share an OCR--Bookstores might get a bit irritated what the first page of all their books are missing)
I do agree with Elfwreck, the real risk is not if you are in the right, the real risk is being sued. I think a way to reduce risk would be to discuss over a forum but only share/verify on a personal level. =X= |
![]() |
![]() |
![]() |
#55 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,592
Karma: 4290425
Join Date: Jun 2009
Location: Foristell, Missouri, USA
Device: Nokia N800, PRS-505, Nook STR Glowlight, Kindle 3, Kobo Libra 2
|
Quote:
|
|
![]() |
![]() |
![]() |
#56 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,478
Karma: 5171130
Join Date: Jan 2006
Device: none
|
I'd like to share a tip that has improved my scan output quality, and minimized errors in final text, and it's not as bad as it sounds: Add a scan step to your process.
Specifically, use a good photocopier to create letter/A4-sized pages of your books. If your book page is smaller than letter/A4, set the copier to enlarge the copy to fit the page. That way, you get larger letters, clearer spaces and punctuation, making the OCR process easier. You can also take advantage of any copier image controls to improve text/background contrast on the pages, further improving character legibility. The advantage of this is that you can then feed those letter/A4 sheets through a high-quality professional scanner... they are optimized for letter/A4 page processing, and most will give you 300-600DPI TIF image files. I've done this in the past, typically taking 10-30 minutes to copy the pages of an average book, depending on the copier type. The rest of the process takes about as long, but if your scanner has an automatic feeder, it can scan 50-100 pages a minute, and save you even more time in the scan process. Not to mention generating fewer errors in OCR. FYI: Sorry, I don't have the access to copiers and scanners that I used to, so I can't recommend brands... Last edited by Steven Lyle Jordan; 10-07-2009 at 01:14 PM. Reason: Said "JPG" when I meant to say "TIF". Sorry! |
![]() |
![]() |
![]() |
#57 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
|
Do not use JPEG files for scans, the JPEG format is ill-suited for images with sharp edges such as text or line art. Best to use a lossless format such as TIFF or PNG (or PDF, as long as it's not using JPEG compression for images). Check FineReader manual for more advice on how to get best scans for OCR.
Another option is to use a digital camera, ABBYY has some tips about it too. |
![]() |
![]() |
![]() |
#58 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,478
Karma: 5171130
Join Date: Jan 2006
Device: none
|
|
![]() |
![]() |
![]() |
#59 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,293
Karma: 529619
Join Date: May 2007
Device: iRex iLiad, DR800SG
|
Quote:
Further, while you are not allowed to give your digital copy to your friend, it's probably true that your friend can hire you to digitize his copy of the printed book. Even though the end result is exactly the same, this time it is probably legal. Nobody claimed that current copyright law makes any sense. Of course, the other question is would the copyright holder care or even know that you gave your friend a copy? Probably not. But, technically, it would be infringement. |
|
![]() |
![]() |
![]() |
#60 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 551
Karma: 1121392
Join Date: May 2008
Location: USA
Device: HTC One M8
|
I've tried doing the same pages (mass market paperback text) as JPEG and TIFF, and the error count after OCR with Finereader wasn't significantly different. It may make more of a difference for illustrations.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
calibre crashes when scanning and adding books | oncdoc | Calibre | 8 | 04-21-2010 03:03 PM |
Scanning books - New need help | Sporadic | Workshop | 9 | 04-19-2009 01:11 PM |
Scanning paper (out of copyright) books. | Charles Gray | Workshop | 18 | 03-25-2009 02:06 PM |
Scanning books | Nate the great | Lounge | 10 | 11-04-2007 01:20 AM |
Scanning books from your own library | Alexander Turcic | Deals and Resources (No Self-Promotion or Affiliate Links) | 13 | 06-16-2006 12:28 AM |