Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 11-14-2018, 06:22 AM   #1
Ruskie_it
Fanatic
Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.
 
Posts: 536
Karma: 1000000
Join Date: Dec 2011
Location: Rome, Italy
Device: Kindle PW5, Kindle PW4, Kindle 4 NT
Remove dashes within words

Hello, I am dealing with a document probably scanned from paper, and I find so many words that have a dash among letters (the hypen sign): something like - italian text:

Il gat-to saltò giù dal letto e si incam-minò verso la porta, do-ve lo stavo aspettan-do.

I would like to remove them in a semi-automatic process, that is a regex search who can highlight them and if it's not a false positive, manually hitting Replace I would like to fix it.
However, I can't seem to make it work.
I have found the sticky message with saved searches and there are a couple that should do right this, but they don't seem to work for me.
For example, senhal in 2015 wrote this one:

Code:
"case_sensitive": false, 
      "dot_all": false, 
      "find": "(?s)([a-zàáèéìíòóùú])- *([a-zàáèéìíòóùú])(?![^<>]*>)(?!.*<body[^>]*>)", 
      "mode": "regex", 
      "name": "FIX: words with dash inside [del]", 
      "replace": "\\1\\2"
It correctly identifies oddly dashed words, but when I click "Replace" I got \1 and \2 replacing the offending text, which is wrong.
For example, if I apply the search and replace function above to the following words, see what I get:

disprezzar-lo ---> disprezza\1\2o
na-va ---> n\1\2e

Etc. etc.
Anyone can help?

Thank you so much
R.
Ruskie_it is offline   Reply With Quote
Old 11-14-2018, 07:30 AM   #2
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,908
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
The replace text should be:

Code:
\1\2
With what you have, you are escaping the backslash, so it puts a single backslash in for each pair.
davidfor is offline   Reply With Quote
Advert
Old 11-14-2018, 07:46 AM   #3
Ruskie_it
Fanatic
Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.
 
Posts: 536
Karma: 1000000
Join Date: Dec 2011
Location: Rome, Italy
Device: Kindle PW5, Kindle PW4, Kindle 4 NT
Quote:
Originally Posted by davidfor View Post
The replace text should be:

Code:
\1\2
With what you have, you are escaping the backslash, so it puts a single backslash in for each pair.
Ok (it works). Thank you SO much!

But then again, this makes me ask a new question. I came up with a less powerful but same kind of search.
Except I didn't use the saved search feature, but the search fields within the editor, directly.
I searched for: [a-z,A-Z]-[a-z,A-Z]
And I replaced with: \1\2 (just like you suggested)
Regular Expression search set.

Why this finds what it is expected to find, but then when I hit "Replace" I get:
IndexError: no such group

??
Ruskie_it is offline   Reply With Quote
Old 11-14-2018, 07:55 AM   #4
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,908
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
The "\1" and "\2" are for replacement groups in the regex. These are defined by parentheses around sections of the pattern. You search should be:

Code:
([a-z,A-Z])-([a-z,A-Z])
But, I don't think you want the commas. With them, it will match a comma before or after a hyphen. It probably won't matter as I wouldn't expect to find that. But, it probably should be:

Code:
([a-zA-Z])-([a-zA-Z])
davidfor is offline   Reply With Quote
Old 11-14-2018, 07:57 AM   #5
stumped
Wizard
stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.
 
Posts: 3,305
Karma: 10259306
Join Date: May 2016
Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,
the replace string is looking for 2 round bracketed items, and not finding any. try FIND ([a-z,A-Z])-([a-z,A-Z])

( i see the answer got posted already, while I was typing)
stumped is offline   Reply With Quote
Advert
Old 11-14-2018, 08:01 AM   #6
Ruskie_it
Fanatic
Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.
 
Posts: 536
Karma: 1000000
Join Date: Dec 2011
Location: Rome, Italy
Device: Kindle PW5, Kindle PW4, Kindle 4 NT
Great, thank you both!
Ruskie_it is offline   Reply With Quote
Reply

Tags
regex, request

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Dashed Dashes -- Befuddled by EN and EM Dashes (Apple Pages to EPUB) planewryter Conversion 1 07-22-2012 09:52 PM
Question- Hypens (dashes) insterted between words? sn0fl8k3 Calibre 16 08-21-2010 04:47 AM
BD and dashes problem Otter Sony Reader 1 09-25-2007 05:47 AM


All times are GMT -4. The time now is 06:41 PM.


MobileRead.com is a privately owned, operated and funded community.