Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 11-14-2018, 07:22 AM   #1
Ruskie_it
Addict
Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.
 
Posts: 352
Karma: 376110
Join Date: Dec 2011
Location: Rome, Italy
Device: Kindle PW2, Kindle 4 NT, Cybook Odissey
Remove dashes within words

Hello, I am dealing with a document probably scanned from paper, and I find so many words that have a dash among letters (the hypen sign): something like - italian text:

Il gat-to salt˛ gi¨ dal letto e si incam-min˛ verso la porta, do-ve lo stavo aspettan-do.

I would like to remove them in a semi-automatic process, that is a regex search who can highlight them and if it's not a false positive, manually hitting Replace I would like to fix it.
However, I can't seem to make it work.
I have found the sticky message with saved searches and there are a couple that should do right this, but they don't seem to work for me.
For example, senhal in 2015 wrote this one:

Code:
"case_sensitive": false, 
      "dot_all": false, 
      "find": "(?s)([a-zÓßŔÚýݲˇ¨˙])- *([a-zÓßŔÚýݲˇ¨˙])(?![^<>]*>)(?!.*<body[^>]*>)", 
      "mode": "regex", 
      "name": "FIX: words with dash inside [del]", 
      "replace": "\\1\\2"
It correctly identifies oddly dashed words, but when I click "Replace" I got \1 and \2 replacing the offending text, which is wrong.
For example, if I apply the search and replace function above to the following words, see what I get:

disprezzar-lo ---> disprezza\1\2o
na-va ---> n\1\2e

Etc. etc.
Anyone can help?

Thank you so much
R.
Ruskie_it is offline   Reply With Quote
Old 11-14-2018, 08:30 AM   #2
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 15,687
Karma: 25902256
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo,Aura H2O,Glo HD,Aura ONE,Clara HD,Forma;tolino epos
The replace text should be:

Code:
\1\2
With what you have, you are escaping the backslash, so it puts a single backslash in for each pair.
davidfor is offline   Reply With Quote
Old 11-14-2018, 08:46 AM   #3
Ruskie_it
Addict
Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.
 
Posts: 352
Karma: 376110
Join Date: Dec 2011
Location: Rome, Italy
Device: Kindle PW2, Kindle 4 NT, Cybook Odissey
Quote:
Originally Posted by davidfor View Post
The replace text should be:

Code:
\1\2
With what you have, you are escaping the backslash, so it puts a single backslash in for each pair.
Ok (it works). Thank you SO much!

But then again, this makes me ask a new question. I came up with a less powerful but same kind of search.
Except I didn't use the saved search feature, but the search fields within the editor, directly.
I searched for: [a-z,A-Z]-[a-z,A-Z]
And I replaced with: \1\2 (just like you suggested)
Regular Expression search set.

Why this finds what it is expected to find, but then when I hit "Replace" I get:
IndexError: no such group

??
Ruskie_it is offline   Reply With Quote
Old 11-14-2018, 08:55 AM   #4
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 15,687
Karma: 25902256
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo,Aura H2O,Glo HD,Aura ONE,Clara HD,Forma;tolino epos
The "\1" and "\2" are for replacement groups in the regex. These are defined by parentheses around sections of the pattern. You search should be:

Code:
([a-z,A-Z])-([a-z,A-Z])
But, I don't think you want the commas. With them, it will match a comma before or after a hyphen. It probably won't matter as I wouldn't expect to find that. But, it probably should be:

Code:
([a-zA-Z])-([a-zA-Z])
davidfor is offline   Reply With Quote
Old 11-14-2018, 08:57 AM   #5
stumped
Wizard
stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.
 
Posts: 1,127
Karma: 1326852
Join Date: May 2016
Device: Samsung tab s , fire HDX 8.9, fire hd 8
the replace string is looking for 2 round bracketed items, and not finding any. try FIND ([a-z,A-Z])-([a-z,A-Z])

( i see the answer got posted already, while I was typing)
stumped is offline   Reply With Quote
Old 11-14-2018, 09:01 AM   #6
Ruskie_it
Addict
Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.Ruskie_it ought to be getting tired of karma fortunes by now.
 
Posts: 352
Karma: 376110
Join Date: Dec 2011
Location: Rome, Italy
Device: Kindle PW2, Kindle 4 NT, Cybook Odissey
Great, thank you both!
Ruskie_it is offline   Reply With Quote
Reply

Tags
regex, request

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Dashed Dashes -- Befuddled by EN and EM Dashes (Apple Pages to EPUB) planewryter Conversion 1 07-22-2012 10:52 PM
Question- Hypens (dashes) insterted between words? sn0fl8k3 Calibre 16 08-21-2010 05:47 AM
BD and dashes problem Otter Sony Reader 1 09-25-2007 06:47 AM


All times are GMT -4. The time now is 01:19 PM.


MobileRead.com is a privately owned, operated and funded community.