I found over 200 uncorrected OCR errors in a bought book last week
A few common ones (sp means shows up on spelling/grammar check when exported to docx)
Wrong -> correct
hut -> but
! or i -> l (lower case L) sp
‘ -> “ (the closing quote ”) sp
fiat -> flat
names (sp if you make a custom name dictionary)
Extra spaces in words sp
Extra periods or commas sp
Missing end of line punctuation (sp with regex)
Most of the bought books with lots of errors would not even pass spell/grammar check in LibreOffice or look good on screen. Much on Gutenberg is better.
|