Hi
Tesseract, gimageReader, LO.
All images are in the attached zip file.
The sources are the two attached images Pasteur 01.jpg and Pasteur 02.jpg. It's a scientific (admittedly old) text, with italics, superscript, some special characters, nothing specially easy.
I took the following screenshots
- écran gimagereader is what you get. You can correct some red mistakes or follow on. I did not correct anything.
- écran gimagereader2 is what you get when you click to suppress line ends.
- Pasteur.txt is the output from gimageReader.
- Pasteur.odt is what you get on LO when you import the file Pasteur.txt in your working model.
- checking.png is how I proceed for the checking phase. I put the image on the left, the working model on the right.
I hope these images and screenshots will provide you with an honest understanding of what Tesseract 4.1.1. can do now. The text of most of the fiction books is easier than this example.
|