|10-24-2007, 11:01 AM||#1|
Recovering Gadget Addict
Join Date: May 2004
Location: Pittsburgh, PA
OCRopus - Google's open source Linux software
Google officially released the alpha version of their open source OCR software yesterday. ArsTechnica has more of the technical details and a hands on review.
"Google's involvement in the project is motivated by the company's interest in digitizing printed documents. Open-source OCR technology could be valuable in many other contexts as well. Government agencies that want to digitize paper records, for instance, could one day benefit from OCRopus. Although OCRopus is weak in many areas, it has some real potential."
In terms of current quality, "OCRopus was able to provide readable output in about half of our tests." You can see more details in the ArsTechnica article, but it sounds like they have some work to do. Not sure if the beta expected in 2008Q1 addresses accuracy or not, but this is probably just the beginning.
Some if the tech tidbits shared:
* Built on HP's open-source Tesseract OCR engine
* Released under Apache License 2.0
* OpenFST library is used for language modeling
* Designed to be modular - to allow future support for non-Latin languages
* Developed in Lua
|10-24-2007, 10:14 PM||#2|
Join Date: Aug 2007
Device: Kobo h2o
Cool to hear that Google's developing with Lua. I've been learning the language lately, and I am really enjoying it.
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|Calibre ranked #7 as 'Greatest Open Source Software Of 2009'||quicksilver||Calibre||44||12-30-2009 11:23 AM|
|Open source||bradrice||Kindle Formats||2||12-21-2009 09:30 AM|
|Hello From Virginia & New Open Source Software||lprichar||Introduce Yourself||3||06-06-2009 08:13 PM|
|iRex and Open Source||jrial||iRex||8||03-03-2009 10:34 AM|
|Open source lrs2lrf||kovidgoyal||LRF||14||07-25-2008 09:17 PM|