Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book General > News

Notices

Reply
 
Thread Tools Search this Thread
Old 09-30-2009, 07:31 PM   #16
Moejoe
Banned
Moejoe did not drink the Kool Aid.Moejoe did not drink the Kool Aid.Moejoe did not drink the Kool Aid.Moejoe did not drink the Kool Aid.Moejoe did not drink the Kool Aid.Moejoe did not drink the Kool Aid.Moejoe did not drink the Kool Aid.Moejoe did not drink the Kool Aid.Moejoe did not drink the Kool Aid.Moejoe did not drink the Kool Aid.Moejoe did not drink the Kool Aid.
 
Posts: 5,100
Karma: 72193
Join Date: Feb 2009
Location: South of the Border
Device: Coffin
Quote:
Originally Posted by Hellmark View Post
Problem is, ABBYY only makes for Windows. OSX and Linux users are screwed. Tesseract is opensource, with native ports to those OS's.
There is a version for Mac (I just discovered) it's Finereader Express Edition.

http://buy.abbyy.com/content/freemac/default.aspx There's no demo (I can see) so you wouldn't know if you were buying a pig or not.
Moejoe is offline   Reply With Quote
Old 09-30-2009, 07:44 PM   #17
wayrad
Fanatic
wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.
 
Posts: 551
Karma: 1121392
Join Date: May 2008
Location: USA
Device: HTC One M8
Gazza, I've done something very similar to what you describe on several dozen of my books. I have an Opticbook and it can scan 5-6 pages per minute. Started out with the Finereader Sprint (crippled Finereader version) that came with the scanner, but soon upgraded to Abbyy Finereader 8.0, and later 9.0 - either is a vast improvement.

All I really want to do is generate a plain old .rtf file (I read on a PDA, so fancy formatting would be wasted), so after recognizing the text in Finereader I generally save to Word (using the option to not save headers and footers, which gets rid of page numbers and running heads) and correct the errors there. Finereader has spellchecking capabilities, and lets you compare the scans with the output onscreen, but for my purposes Word is quicker, at least if I have the hard copy in front of me for comparison. The search and replace functions are particularly useful. At this stage I am just glancing through the text rather than reading it.

After that I save the document in rich text format. Then exit, reopen it in Wordpad, and resave it to reduce Word-induced file bloat (this might not be necessary if you saved to .txt). Then move it to my PDA and read it, bookmarking any remaining errors. Finally, fix the errors in the Word file, repeat the conversion, and replace the previous version on my PDA.

It's not as sophisticated as what many people here do, but for my needs it's fine.

Last edited by wayrad; 09-30-2009 at 07:51 PM.
wayrad is offline   Reply With Quote
Advert
Old 09-30-2009, 07:46 PM   #18
AnemicOak
Bookaholic
AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.
 
AnemicOak's Avatar
 
Posts: 14,391
Karma: 54969924
Join Date: Oct 2007
Location: Minnesota
Device: iPad Mini 4, AuraHD, iPhone XR +
I've only scanned one book, You're Stepping On My Cloak & Dagger by Roger Hall, but I agree ABBYY works great. I only had a handful of errors to correct for the entire book. Didn't take as long as I thought it would either.
AnemicOak is offline   Reply With Quote
Old 09-30-2009, 08:03 PM   #19
edembowski
Zealot
edembowski has a complete set of Star Wars action figures.edembowski has a complete set of Star Wars action figures.edembowski has a complete set of Star Wars action figures.edembowski has a complete set of Star Wars action figures.
 
edembowski's Avatar
 
Posts: 138
Karma: 372
Join Date: Apr 2008
Location: New York, NY
Device: Sony PRS-600, Nook Color, iPad
Quote:
Originally Posted by Moejoe View Post
There is a version for Mac (I just discovered) it's Finereader Express Edition.

http://buy.abbyy.com/content/freemac/default.aspx There's no demo (I can see) so you wouldn't know if you were buying a pig or not.
The windows version really is much better. I've used both, and now I have a windows vm running on my Mac. Windows is there just for Abby.

- Ed
edembowski is offline   Reply With Quote
Old 10-01-2009, 12:47 AM   #20
doreenjoy
01000100 01001010
doreenjoy ought to be getting tired of karma fortunes by now.doreenjoy ought to be getting tired of karma fortunes by now.doreenjoy ought to be getting tired of karma fortunes by now.doreenjoy ought to be getting tired of karma fortunes by now.doreenjoy ought to be getting tired of karma fortunes by now.doreenjoy ought to be getting tired of karma fortunes by now.doreenjoy ought to be getting tired of karma fortunes by now.doreenjoy ought to be getting tired of karma fortunes by now.doreenjoy ought to be getting tired of karma fortunes by now.doreenjoy ought to be getting tired of karma fortunes by now.doreenjoy ought to be getting tired of karma fortunes by now.
 
doreenjoy's Avatar
 
Posts: 1,889
Karma: 2400000
Join Date: Mar 2009
Device: Polyamorous
I've had a few paper books scanned in. I found the best results by scanning the books to PDF, then using ABBYY to convert the PDF to text.
doreenjoy is offline   Reply With Quote
Advert
Old 10-01-2009, 02:27 AM   #21
Mr. Dalliard
Zealot
Mr. Dalliard began at the beginning.
 
Posts: 143
Karma: 35
Join Date: Jan 2009
Location: Osaka, Japan
Device: Kindle 3
I currently engaged in a 10,000+ page bilingual OCR project.
I'm about a fifth of the way in, and the process is becoming more streamlined as I progress.

I was using the company copier for a while, which produced a nice monochrome 600 dpi PDF. However, some of the volumes are so thick and heavy that, in the end, I decided to do the remainder by hand, rather than risk damaging the books, and my wrists.

I now use a makeshift frame, to hold the book open; a 1cm thick clear acrylic sheet, to flatten the page; two lamps, for illumination; and a 10Mp digital camera at a distance of around 50cm - to avoid barrel distortion - to take the shots.

Unlike the PDFs from the copier, a little extra post-processing of the images is required for painfree OCRing (gamma adjustment > monochrome) but I have got that too down to a fine art. Obviously the resulting images can't compare with the 600dpi of the copier, but, fortunately, the original text is quite large anyway so it still works well.

Next comes the proofreading of the output.....

Last edited by Mr. Dalliard; 10-01-2009 at 02:30 AM.
Mr. Dalliard is offline   Reply With Quote
Old 10-01-2009, 11:47 PM   #22
ascherjim
Addict
ascherjim has a complete set of Star Wars action figures.ascherjim has a complete set of Star Wars action figures.ascherjim has a complete set of Star Wars action figures.
 
Posts: 260
Karma: 274
Join Date: Apr 2006
Location: Gig Harbor, Washington
Device: BeBook One, PocketBook 360, Kindle Paperwhite, Kobo Aura One
I've just "stumbled" onto this thread -- and it's like a gift from the heavens! So many of the books I'd like to reread on my BeBook that my wife and I have sitting on our shelves gathering dust (and occupying more space than my wife wishes to have much longer allocated to them), and are not available for current purchase as ebooks (and may never be in my lifetime -- I AM getting on in years!) I can now through all of your guidance in this thread scan and convert them into ebook form. I've already ordered the book scanner recommended above, and will probably also purchase a more advanced version of the ABBYY software than the Finereader version that comes with it. Thanks to all of you in advance.
ascherjim is offline   Reply With Quote
Old 10-02-2009, 12:17 AM   #23
pilotbob
Grand Sorcerer
pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.
 
pilotbob's Avatar
 
Posts: 19,832
Karma: 11844413
Join Date: Jan 2007
Location: Tampa, FL USA
Device: Kindle Touch
Quote:
Originally Posted by Moejoe View Post
Now if only I could find an OCR on 'buntu or mac that worked as well as ABBYY.
Um.... why don't you use ABBYY Fine Reader for Mac? Seems like is might work as well...

http://www.abbyy.com/finereader_for_mac/

BOb
pilotbob is offline   Reply With Quote
Old 10-02-2009, 02:33 AM   #24
gazza
Member
gazza began at the beginning.
 
Posts: 10
Karma: 15
Join Date: Sep 2009
Device: iPod Touch
All of which is utterly fascinating. Some more information. Agreed that scanning in a book is a waste of time. But I work mainly in Australia, the UK and China. In China we have a situation where a young chap has had to return to his aged parents out in the boondocks. (The wrong word as, in fact, it is Tagalog meaning 'wooded place' but it will do.) It is possible for me to ship him a thousand or so books at a time and for him to scan them in for me. My wife insists on getting the books back -- I see no logic in this -- but even then the cost is minimal.
The cost comes in the proofreading. If we use the right scanner and the right software -- and I learn something every time I access this forum -- we should get pretty clean copy because you can mask the pages so the headers and footers and page numbers are not scanned in. Say we get four books a day, 20 a week. That should keep us happy.
I though the only option for OCR was (dammit, the name has skipped my brain for the moment) but now I shall seriously look at ABBYy.
The idea of building an automatic flash page reader appeals tremendously if I can get someone to do it for me. I cannot hold a screwdriver straight. The problem then will be proofreading. If Google cannot get it right -- and it hasn't -- what chance for mere mortals?
Finally (how the man does do on) the law of copyright is perfectly clear that if you buy a book you can copy it for your own use. Publishers lie in their teeth to make you believe otherwise but that is in the Berne and Geneva Convention. That does not mean you can copy it and then put it on one of these Bit torrent thingies. But you can do it for your own use. For certain sure.

Gareth Powell in Sydney were it cannot make up its mind about the weather
gazza is offline   Reply With Quote
Old 10-02-2009, 04:14 AM   #25
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Hi Gareth,

I have a lot of experience in proof-reading books, and believe me, you cannot properly proof-read a book in an hour . The only way to proof-read is to compare the original book and the electronic text, side by side, and look at every word, every punctuation mark, etc. I am currently undertaking the mammoth task of proof-reading all the Charles Dickens books that I've created and uploaded here at MobileRead. I'm currently nearing the end of "David Copperfield", a book which I started proof-reading two months ago, and have spend approximely 2 hours a day on, 7 days a week, since then. That's about 120 hours of work to proof-read one book, and it's still not finished.

Proper proof-reading is enormously "labour intensive", and there aren't any shortcuts.

PLEASE don't use text format for your scanned books; you'll lose all the formatting, which adds so much to the book. Some "rich" format such as HTML will be enormously better.
HarryT is offline   Reply With Quote
Old 10-02-2009, 04:47 AM   #26
Sweetpea
Grand Sorcerer
Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.
 
Sweetpea's Avatar
 
Posts: 9,707
Karma: 32763414
Join Date: Dec 2008
Location: Krewerd
Device: Pocketbook Inkpad 4 Color; Samsung Galaxy Tab S6
Proofreading would depend on how true to the printed word you want your electronic word...

I've scanned books and my proofreading consists of reading the book and annotating errors (using a touchscreen reader). I'll go back to the source document and update it after I've finished the book.

Layout I'm not too bothered with, generally, but I'm mostly reading contemporary novels, which have a basic layout even in print form.

I too, would not recommend text as your source document, but HTML.
Sweetpea is offline   Reply With Quote
Old 10-02-2009, 07:54 AM   #27
wayrad
Fanatic
wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.
 
Posts: 551
Karma: 1121392
Join Date: May 2008
Location: USA
Device: HTC One M8
Bu "masking the pages" to get rid of page numbers, do you mean adjusting the scan area or putting the edge of the book off the glass part of the scanner? I used to do that and it slows things down enormously, as well as not working very well because of page to page variability in the placing of the numbers. Finereader does let you crop the edges off the onscreen images, but this must be done page by page because of the aforementioned variability. This was probably the main factor driving my upgrade to later versions, which have the "don't save headers and footers" option when saving to Word. Perhaps there's a better way; if so I'd love to hear it.

As far as format goes, if these books are purely for your personal use, by all means do whatever suits you and your reading device. What works for someone else may not be the best for you.

Last edited by wayrad; 10-02-2009 at 08:08 AM.
wayrad is offline   Reply With Quote
Old 10-02-2009, 08:36 AM   #28
edembowski
Zealot
edembowski has a complete set of Star Wars action figures.edembowski has a complete set of Star Wars action figures.edembowski has a complete set of Star Wars action figures.edembowski has a complete set of Star Wars action figures.
 
edembowski's Avatar
 
Posts: 138
Karma: 372
Join Date: Apr 2008
Location: New York, NY
Device: Sony PRS-600, Nook Color, iPad
Quote:
Originally Posted by wayrad View Post
Bu "masking the pages" to get rid of page numbers, do you mean adjusting the scan area or putting the edge of the book off the glass part of the scanner? I used to do that and it slows things down enormously, as well as not working very well because of page to page variability in the placing of the numbers. ...
Using Finerader, it takes me about 10-15 minutes to take out the page numbers. I find it well worth the time, since it's annoying to me having page numbers in the middle of the text. I think it's mostly personal preference there.

I've tried the Mac version & the Windows version. Right now, i'm running the Windows version on my Mac (inside a virtual machine).

- Ed
edembowski is offline   Reply With Quote
Old 10-02-2009, 09:24 AM   #29
wayrad
Fanatic
wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.
 
Posts: 551
Karma: 1121392
Join Date: May 2008
Location: USA
Device: HTC One M8
Quote:
Originally Posted by edembowski View Post
Using Finerader, it takes me about 10-15 minutes to take out the page numbers. I find it well worth the time, since it's annoying to me having page numbers in the middle of the text. I think it's mostly personal preference there.
Do you use the page crop feature, or is there a better way?
wayrad is offline   Reply With Quote
Old 10-02-2009, 09:30 AM   #30
igorsk
Wizard
igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.
 
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
BTW, ABBYY is having a promo right now: buy FR9 and get a free upgrade to FR10 in a few weeks:
Quote:
FineReader 10 will be available soon. In the meantime, we offer you the option to purchase FineReader 9.0 Professional Edition today and you will receive a free version of FineReader 10 once it is available.
New features in FR10:
Achievements in OCR Accuracy and Performance
ADRT® analyses a multipage document as a single entity
3rd Generation Camera OCR: Reads Phone Camera Photos
Enhanced Usability – New Quick Tasks and Interface Revisions
Saving E-books to HTML Chapters and Flexible HTML
Powerful PDF Compression
Further Improvements in Page Layout Analysis
New Recognition Languages – Korean and Yiddish
igorsk is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
calibre crashes when scanning and adding books oncdoc Calibre 8 04-21-2010 03:03 PM
Scanning books - New need help Sporadic Workshop 9 04-19-2009 01:11 PM
Scanning paper (out of copyright) books. Charles Gray Workshop 18 03-25-2009 02:06 PM
Scanning books Nate the great Lounge 10 11-04-2007 01:20 AM
Scanning books from your own library Alexander Turcic Deals and Resources (No Self-Promotion or Affiliate Links) 13 06-16-2006 12:28 AM


All times are GMT -4. The time now is 05:47 PM.


MobileRead.com is a privately owned, operated and funded community.