MobileRead Forums - View Single Post

roger64 · 08-13-2019, 07:03 AM

OCR

I am trying Tesseract. Overall results so far are excellent. Some few mistakes appear.

Sometimes, faulty words contain a digit. Like in French, mo1 for moi. Also, usually these words do not have a -.

Confusions of this kind may appear (this is just an example):

5 → S 1 → i 0 → O
2 → Z 4→ A 8 → B

I'd like to use a regex which would detect complete words containing one or more digits (and maybe some special characters that I could add in the regex like €) so that I could check them quickly.

08-13-2019, 07:03 AM	#606
roger64 Wizard Posts: 2,625 Karma: 3120635 Join Date: Jan 2009 Device: Kindle PW3 (wifi)	OCR I am trying Tesseract. Overall results so far are excellent. Some few mistakes appear. Sometimes, faulty words contain a digit. Like in French, mo1 for moi. Also, usually these words do not have a -. Confusions of this kind may appear (this is just an example): 5 → S 1 → i 0 → O 2 → Z 4→ A 8 → B I'd like to use a regex which would detect complete words containing one or more digits (and maybe some special characters that I could add in the regex like €) so that I could check them quickly. Last edited by roger64; 08-13-2019 at 07:08 AM.