View Single Post
Old 01-29-2021, 08:35 AM   #10
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 80,228
Karma: 148951761
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by Ghitulescu View Post
Code:
\b([A-Za-z]) ([A-Za-z]) ([A-Za-z]) ([A-Za-z])\b
as most words are 4+ letters long. Also, being foreign language, the elimination of I (first person) would have been counterproductive (lots of foreign glyphs are OCRed as I, for instance ïìîı, because they are longer than i, also l is considered as I in sans-serif fonts).

I could live with a handful of 3-letter long "escapees"

I know it was called letterspacing, but the use of this term would have forced me to rewrite the sentence once again I tried to use simple words
The OCR insert however spaces.
Again do it by hand. What if you have something like "o n a bus"? You would end up with "ona bus"

You cannot regex this away. You have to do it by hand because you will combine letters/words you do not want to.

Use the regex for searching. But do the fixing by hand.
JSWolf is offline   Reply With Quote