Thread: OCR engine
View Single Post
Old 05-03-2014, 02:46 AM   #54
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by DebbyS View Post
So for anyone actually proofreading, consider I suggest using "DPCustomMono" to maybe speed things up
That is a great tip for those who proofread with their eyes! (I do most of my fixing with regex + a quick pass with my eyes).

That font was recommended at Distributed Proofreaders. There is a page showing off this font compared to some others:

http://www.pgdp.net/c/faq/font_sample.php

Quote:
Originally Posted by DebbyS View Post
If I had used TNR to begin with, I probably would miss a lot, particularly when "l" (el) is used in a date, such as 196O or l96o rather than 1900 (the book has small zeros, which confuses the OCR).
That is a very common error from OCR, and is pretty hard to spot with your just your eyes in most fonts.

I use these four Regexes to catch those (I have these in my Saved Searches in Sigil, and then I just go through quickly one-by-one and decide on a case-by-case basis):

Search: [l]([0-9])
Replace: 1\1

Search: ([0-9])[l]
Replace: \11

Search: [oO]([0-9])
Replace: 0\1

Search: ([0-9])[oO]
Replace: \10

I believe Word uses a completely different Regex engine, but the spirit should be the same.

Quote:
Originally Posted by DebbyS View Post
From time to time in my work I scan books for a local publisher who will eventually turn them into ebooks.
Fantastic, keep up the good work. All the books must be digitized!
Tex2002ans is offline   Reply With Quote