09-30-2009, 07:31 PM | #16 | |
Banned
Posts: 5,100
Karma: 72193
Join Date: Feb 2009
Location: South of the Border
Device: Coffin
|
Quote:
http://buy.abbyy.com/content/freemac/default.aspx There's no demo (I can see) so you wouldn't know if you were buying a pig or not. |
|
09-30-2009, 07:44 PM | #17 |
Fanatic
Posts: 551
Karma: 1121392
Join Date: May 2008
Location: USA
Device: HTC One M8
|
Gazza, I've done something very similar to what you describe on several dozen of my books. I have an Opticbook and it can scan 5-6 pages per minute. Started out with the Finereader Sprint (crippled Finereader version) that came with the scanner, but soon upgraded to Abbyy Finereader 8.0, and later 9.0 - either is a vast improvement.
All I really want to do is generate a plain old .rtf file (I read on a PDA, so fancy formatting would be wasted), so after recognizing the text in Finereader I generally save to Word (using the option to not save headers and footers, which gets rid of page numbers and running heads) and correct the errors there. Finereader has spellchecking capabilities, and lets you compare the scans with the output onscreen, but for my purposes Word is quicker, at least if I have the hard copy in front of me for comparison. The search and replace functions are particularly useful. At this stage I am just glancing through the text rather than reading it. After that I save the document in rich text format. Then exit, reopen it in Wordpad, and resave it to reduce Word-induced file bloat (this might not be necessary if you saved to .txt). Then move it to my PDA and read it, bookmarking any remaining errors. Finally, fix the errors in the Word file, repeat the conversion, and replace the previous version on my PDA. It's not as sophisticated as what many people here do, but for my needs it's fine. Last edited by wayrad; 09-30-2009 at 07:51 PM. |
Advert | |
|
09-30-2009, 07:46 PM | #18 |
Bookaholic
Posts: 14,391
Karma: 54969924
Join Date: Oct 2007
Location: Minnesota
Device: iPad Mini 4, AuraHD, iPhone XR +
|
I've only scanned one book, You're Stepping On My Cloak & Dagger by Roger Hall, but I agree ABBYY works great. I only had a handful of errors to correct for the entire book. Didn't take as long as I thought it would either.
|
09-30-2009, 08:03 PM | #19 | |
Zealot
Posts: 138
Karma: 372
Join Date: Apr 2008
Location: New York, NY
Device: Sony PRS-600, Nook Color, iPad
|
Quote:
- Ed |
|
10-01-2009, 12:47 AM | #20 |
01000100 01001010
Posts: 1,889
Karma: 2400000
Join Date: Mar 2009
Device: Polyamorous
|
I've had a few paper books scanned in. I found the best results by scanning the books to PDF, then using ABBYY to convert the PDF to text.
|
Advert | |
|
10-01-2009, 02:27 AM | #21 |
Zealot
Posts: 143
Karma: 35
Join Date: Jan 2009
Location: Osaka, Japan
Device: Kindle 3
|
I currently engaged in a 10,000+ page bilingual OCR project.
I'm about a fifth of the way in, and the process is becoming more streamlined as I progress. I was using the company copier for a while, which produced a nice monochrome 600 dpi PDF. However, some of the volumes are so thick and heavy that, in the end, I decided to do the remainder by hand, rather than risk damaging the books, and my wrists. I now use a makeshift frame, to hold the book open; a 1cm thick clear acrylic sheet, to flatten the page; two lamps, for illumination; and a 10Mp digital camera at a distance of around 50cm - to avoid barrel distortion - to take the shots. Unlike the PDFs from the copier, a little extra post-processing of the images is required for painfree OCRing (gamma adjustment > monochrome) but I have got that too down to a fine art. Obviously the resulting images can't compare with the 600dpi of the copier, but, fortunately, the original text is quite large anyway so it still works well. Next comes the proofreading of the output..... Last edited by Mr. Dalliard; 10-01-2009 at 02:30 AM. |
10-01-2009, 11:47 PM | #22 |
Addict
Posts: 260
Karma: 274
Join Date: Apr 2006
Location: Gig Harbor, Washington
Device: BeBook One, PocketBook 360, Kindle Paperwhite, Kobo Aura One
|
I've just "stumbled" onto this thread -- and it's like a gift from the heavens! So many of the books I'd like to reread on my BeBook that my wife and I have sitting on our shelves gathering dust (and occupying more space than my wife wishes to have much longer allocated to them), and are not available for current purchase as ebooks (and may never be in my lifetime -- I AM getting on in years!) I can now through all of your guidance in this thread scan and convert them into ebook form. I've already ordered the book scanner recommended above, and will probably also purchase a more advanced version of the ABBYY software than the Finereader version that comes with it. Thanks to all of you in advance.
|
10-02-2009, 12:17 AM | #23 | |
Grand Sorcerer
Posts: 19,832
Karma: 11844413
Join Date: Jan 2007
Location: Tampa, FL USA
Device: Kindle Touch
|
Quote:
http://www.abbyy.com/finereader_for_mac/ BOb |
|
10-02-2009, 02:33 AM | #24 |
Member
Posts: 10
Karma: 15
Join Date: Sep 2009
Device: iPod Touch
|
All of which is utterly fascinating. Some more information. Agreed that scanning in a book is a waste of time. But I work mainly in Australia, the UK and China. In China we have a situation where a young chap has had to return to his aged parents out in the boondocks. (The wrong word as, in fact, it is Tagalog meaning 'wooded place' but it will do.) It is possible for me to ship him a thousand or so books at a time and for him to scan them in for me. My wife insists on getting the books back -- I see no logic in this -- but even then the cost is minimal.
The cost comes in the proofreading. If we use the right scanner and the right software -- and I learn something every time I access this forum -- we should get pretty clean copy because you can mask the pages so the headers and footers and page numbers are not scanned in. Say we get four books a day, 20 a week. That should keep us happy. I though the only option for OCR was (dammit, the name has skipped my brain for the moment) but now I shall seriously look at ABBYy. The idea of building an automatic flash page reader appeals tremendously if I can get someone to do it for me. I cannot hold a screwdriver straight. The problem then will be proofreading. If Google cannot get it right -- and it hasn't -- what chance for mere mortals? Finally (how the man does do on) the law of copyright is perfectly clear that if you buy a book you can copy it for your own use. Publishers lie in their teeth to make you believe otherwise but that is in the Berne and Geneva Convention. That does not mean you can copy it and then put it on one of these Bit torrent thingies. But you can do it for your own use. For certain sure. Gareth Powell in Sydney were it cannot make up its mind about the weather |
10-02-2009, 04:14 AM | #25 |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Hi Gareth,
I have a lot of experience in proof-reading books, and believe me, you cannot properly proof-read a book in an hour . The only way to proof-read is to compare the original book and the electronic text, side by side, and look at every word, every punctuation mark, etc. I am currently undertaking the mammoth task of proof-reading all the Charles Dickens books that I've created and uploaded here at MobileRead. I'm currently nearing the end of "David Copperfield", a book which I started proof-reading two months ago, and have spend approximely 2 hours a day on, 7 days a week, since then. That's about 120 hours of work to proof-read one book, and it's still not finished. Proper proof-reading is enormously "labour intensive", and there aren't any shortcuts. PLEASE don't use text format for your scanned books; you'll lose all the formatting, which adds so much to the book. Some "rich" format such as HTML will be enormously better. |
10-02-2009, 04:47 AM | #26 |
Grand Sorcerer
Posts: 9,707
Karma: 32763414
Join Date: Dec 2008
Location: Krewerd
Device: Pocketbook Inkpad 4 Color; Samsung Galaxy Tab S6
|
Proofreading would depend on how true to the printed word you want your electronic word...
I've scanned books and my proofreading consists of reading the book and annotating errors (using a touchscreen reader). I'll go back to the source document and update it after I've finished the book. Layout I'm not too bothered with, generally, but I'm mostly reading contemporary novels, which have a basic layout even in print form. I too, would not recommend text as your source document, but HTML. |
10-02-2009, 07:54 AM | #27 |
Fanatic
Posts: 551
Karma: 1121392
Join Date: May 2008
Location: USA
Device: HTC One M8
|
Bu "masking the pages" to get rid of page numbers, do you mean adjusting the scan area or putting the edge of the book off the glass part of the scanner? I used to do that and it slows things down enormously, as well as not working very well because of page to page variability in the placing of the numbers. Finereader does let you crop the edges off the onscreen images, but this must be done page by page because of the aforementioned variability. This was probably the main factor driving my upgrade to later versions, which have the "don't save headers and footers" option when saving to Word. Perhaps there's a better way; if so I'd love to hear it.
As far as format goes, if these books are purely for your personal use, by all means do whatever suits you and your reading device. What works for someone else may not be the best for you. Last edited by wayrad; 10-02-2009 at 08:08 AM. |
10-02-2009, 08:36 AM | #28 | |
Zealot
Posts: 138
Karma: 372
Join Date: Apr 2008
Location: New York, NY
Device: Sony PRS-600, Nook Color, iPad
|
Quote:
I've tried the Mac version & the Windows version. Right now, i'm running the Windows version on my Mac (inside a virtual machine). - Ed |
|
10-02-2009, 09:24 AM | #29 |
Fanatic
Posts: 551
Karma: 1121392
Join Date: May 2008
Location: USA
Device: HTC One M8
|
Do you use the page crop feature, or is there a better way?
|
10-02-2009, 09:30 AM | #30 | |
Wizard
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
|
BTW, ABBYY is having a promo right now: buy FR9 and get a free upgrade to FR10 in a few weeks:
Quote:
Achievements in OCR Accuracy and Performance ADRT® analyses a multipage document as a single entity 3rd Generation Camera OCR: Reads Phone Camera Photos Enhanced Usability – New Quick Tasks and Interface Revisions Saving E-books to HTML Chapters and Flexible HTML Powerful PDF Compression Further Improvements in Page Layout Analysis New Recognition Languages – Korean and Yiddish |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
calibre crashes when scanning and adding books | oncdoc | Calibre | 8 | 04-21-2010 03:03 PM |
Scanning books - New need help | Sporadic | Workshop | 9 | 04-19-2009 01:11 PM |
Scanning paper (out of copyright) books. | Charles Gray | Workshop | 18 | 03-25-2009 02:06 PM |
Scanning books | Nate the great | Lounge | 10 | 11-04-2007 01:20 AM |
Scanning books from your own library | Alexander Turcic | Deals and Resources (No Self-Promotion or Affiliate Links) | 13 | 06-16-2006 12:28 AM |