Hello, I am dealing with a document probably scanned from paper, and I find so many words that have a dash among letters (the hypen sign): something like - italian text:
Il gat-to saltò giù dal letto e si incam-minò verso la porta, do-ve lo stavo aspettan-do.
I would like to remove them in a semi-automatic process, that is a regex search who can highlight them and if it's not a false positive, manually hitting Replace I would like to fix it.
However, I can't seem to make it work.
I have found the sticky message with saved searches and there are a couple that should do right this, but they don't seem to work for me.
For example, senhal in 2015 wrote this one:
Code:
"case_sensitive": false,
"dot_all": false,
"find": "(?s)([a-zàáèéìíòóùú])- *([a-zàáèéìíòóùú])(?![^<>]*>)(?!.*<body[^>]*>)",
"mode": "regex",
"name": "FIX: words with dash inside [del]",
"replace": "\\1\\2"
It correctly identifies oddly dashed words, but when I click "Replace" I got \1 and \2 replacing the offending text, which is wrong.
For example, if I apply the search and replace function above to the following words, see what I get:
disprezzar-lo ---> disprezza\1\2o
na-va ---> n\1\2e
Etc. etc.
Anyone can help?
Thank you so much
R.