OCR
I am trying Tesseract. Overall results so far are excellent. Some few mistakes appear.
Sometimes, faulty words contain a digit. Like in French, mo1 for moi. Also, usually these words do not have a -.
Confusions of this kind may appear (this is just an example):
5 → S 1 → i 0 → O
2 → Z 4→ A 8 → B
I'd like to use a regex which would detect complete words containing one or more digits (and maybe some special characters that I could add in the regex like €) so that I could check them quickly.
Last edited by roger64; 08-13-2019 at 07:08 AM.
|