11-14-2018, 06:22 AM | #1 |
Fanatic
Posts: 536
Karma: 1000000
Join Date: Dec 2011
Location: Rome, Italy
Device: Kindle PW5, Kindle PW4, Kindle 4 NT
|
Remove dashes within words
Hello, I am dealing with a document probably scanned from paper, and I find so many words that have a dash among letters (the hypen sign): something like - italian text:
Il gat-to saltò giù dal letto e si incam-minò verso la porta, do-ve lo stavo aspettan-do. I would like to remove them in a semi-automatic process, that is a regex search who can highlight them and if it's not a false positive, manually hitting Replace I would like to fix it. However, I can't seem to make it work. I have found the sticky message with saved searches and there are a couple that should do right this, but they don't seem to work for me. For example, senhal in 2015 wrote this one: Code:
"case_sensitive": false, "dot_all": false, "find": "(?s)([a-zàáèéìíòóùú])- *([a-zàáèéìíòóùú])(?![^<>]*>)(?!.*<body[^>]*>)", "mode": "regex", "name": "FIX: words with dash inside [del]", "replace": "\\1\\2" For example, if I apply the search and replace function above to the following words, see what I get: disprezzar-lo ---> disprezza\1\2o na-va ---> n\1\2e Etc. etc. Anyone can help? Thank you so much R. |
11-14-2018, 07:30 AM | #2 |
Grand Sorcerer
Posts: 24,908
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
The replace text should be:
Code:
\1\2 |
Advert | |
|
11-14-2018, 07:46 AM | #3 | |
Fanatic
Posts: 536
Karma: 1000000
Join Date: Dec 2011
Location: Rome, Italy
Device: Kindle PW5, Kindle PW4, Kindle 4 NT
|
Quote:
But then again, this makes me ask a new question. I came up with a less powerful but same kind of search. Except I didn't use the saved search feature, but the search fields within the editor, directly. I searched for: [a-z,A-Z]-[a-z,A-Z] And I replaced with: \1\2 (just like you suggested) Regular Expression search set. Why this finds what it is expected to find, but then when I hit "Replace" I get: IndexError: no such group ?? |
|
11-14-2018, 07:55 AM | #4 |
Grand Sorcerer
Posts: 24,908
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
The "\1" and "\2" are for replacement groups in the regex. These are defined by parentheses around sections of the pattern. You search should be:
Code:
([a-z,A-Z])-([a-z,A-Z]) Code:
([a-zA-Z])-([a-zA-Z]) |
11-14-2018, 07:57 AM | #5 |
Wizard
Posts: 3,305
Karma: 10259306
Join Date: May 2016
Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,
|
the replace string is looking for 2 round bracketed items, and not finding any. try FIND ([a-z,A-Z])-([a-z,A-Z])
( i see the answer got posted already, while I was typing) |
Advert | |
|
11-14-2018, 08:01 AM | #6 |
Fanatic
Posts: 536
Karma: 1000000
Join Date: Dec 2011
Location: Rome, Italy
Device: Kindle PW5, Kindle PW4, Kindle 4 NT
|
Great, thank you both!
|
Tags |
regex, request |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Dashed Dashes -- Befuddled by EN and EM Dashes (Apple Pages to EPUB) | planewryter | Conversion | 1 | 07-22-2012 09:52 PM |
Question- Hypens (dashes) insterted between words? | sn0fl8k3 | Calibre | 16 | 08-21-2010 04:47 AM |
BD and dashes problem | Otter | Sony Reader | 1 | 09-25-2007 05:47 AM |