Quote:
Originally Posted by Ghitulescu
Code:
\b([A-Za-z]) ([A-Za-z]) ([A-Za-z]) ([A-Za-z])\b
as most words are 4+ letters long. Also, being foreign language, the elimination of I (first person) would have been counterproductive (lots of foreign glyphs are OCRed as I, for instance ïìîı, because they are longer than i, also l is considered as I in sans-serif fonts).
I could live with a handful of 3-letter long "escapees"
I know it was called letterspacing, but the use of this term would have forced me to rewrite the sentence once again  I tried to use simple words
The OCR insert however spaces.
|
Again do it by hand. What if you have something like "o n a bus"? You would end up with "ona bus"
You cannot regex this away. You have to do it by hand because you will combine letters/words you do not want to.
Use the regex for searching. But do the fixing by hand.