Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book General > Reading Recommendations

Notices

Reply
 
Thread Tools Search this Thread
Old 09-14-2009, 02:55 PM   #31
Greg Anos
Grand Sorcerer
Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.
 
Posts: 11,256
Karma: 35000000
Join Date: Jan 2008
Device: Pocketbook
Quote:
Originally Posted by ahi View Post
I have this notion in my head...

What about taking a given document, OCR-ing it with at least 3 or more different OCR programs, and then parallel parsing them character by character (perhaps now and then making and adjustment, if one of the streams is out of line do to an erroneously detect additional character) and always putting the character into the output stream that the (most) OCR-d texts agree on.

Obviously this won't help with anything that the various OCR programs get wrong in the same way... but it might minimize the amount of clean-up to be done thereafter.

How realistic is such an approach? Anybody here tried it before?

- Ahi

The idea is excellent, but I don't know of anybody who has written flexible parsing software. As a matter of fact, the idea could be used for any ocr'ed texts...

Big problem will be with differences in the embedded control sequences...
Greg Anos is offline   Reply With Quote
Old 09-14-2009, 05:08 PM   #32
igorsk
Wizard
igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.
 
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
Quote:
Originally Posted by Ralph Sir Edward View Post
Igorsky, how does the Google Epub OCR conversion stack up to Finereader's?

Would it be easier to clean up Google's version or Finereader's?
I'm quite sure Finereader will beat Google for almost any partitcular book.
1) ABBYY has been working in this area for a long time and they have pretty clearly the best commercial engine on the market. I'm not sure what Google is using (probably Tesseract) but they have been at it only for a couple of years. I admit Google has their own pool of PhDs so the situation might change in the future.
2) Google cannot afford to manually tune the OCR for each book; the volume is just too big. With Finereader you can check the results and adjust settings as needed, that can help quite a lot.

Quote:
Originally Posted by ahi View Post
I have this notion in my head...

What about taking a given document, OCR-ing it with at least 3 or more different OCR programs, and then parallel parsing them character by character (perhaps now and then making and adjustment, if one of the streams is out of line do to an erroneously detect additional character) and always putting the character into the output stream that the (most) OCR-d texts agree on.

Obviously this won't help with anything that the various OCR programs get wrong in the same way... but it might minimize the amount of clean-up to be done thereafter.

How realistic is such an approach? Anybody here tried it before?
Actually, yes. I've seen a couple of papers and it seems it does help, though setting it up is probably not trivial.
igorsk is offline   Reply With Quote
Advert
Old 09-16-2009, 11:23 AM   #33
dwallbaum
Zealot
dwallbaum has learned how to buy an e-book online
 
Posts: 100
Karma: 77
Join Date: May 2009
Device: Kindle2
Quote:
Originally Posted by ahi View Post
I'd be grateful for some comments and opinions on "The Decline and Fall of the Roman Empire".
...
How good is the Project Gutenberg edition?

- Ahi
Ahi;

Just pulled down the Amazon DFRE $.99 version, a Kindle-specific version, and it is poorly done. The typeface is light, the fully-justified type is not well implemented, paragraph spacing is odd... it could be the style of writing, but there are no breaks where there should be, and huge breake where there doesn't need to be.

BUT, the Amazon version DOES contain footnotes, but they flow as part of the page, so you read every footnote before you continue on with the narrative. It may work for you if you are looking for an eVersion with footnotes.

Don
dwallbaum is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
History Gibbon, Edward: The Decline And Fall Of The Roman Empire, v1.0, 6 July 2008 bartolucci Kindle Books 7 09-05-2010 04:43 PM
History of the Decline and Fall of the Roman Empire shousa Reading Recommendations 8 01-27-2008 02:21 AM
Non-roman fonts ebookworm Sony Reader 6 10-06-2007 12:12 AM
Film chronicles decline of the independent book store sea2stars Lounge 2 02-26-2007 02:45 PM
E-book titles on the decline? Colin Dunstan News 3 08-30-2005 09:37 AM


All times are GMT -4. The time now is 06:26 AM.


MobileRead.com is a privately owned, operated and funded community.