Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 02-23-2023, 05:52 AM   #1
F4in7_
Junior Member
F4in7_ began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Feb 2023
Device: none
Question Requesting help to figure out how to get this to work using regular expressions.

I'm bad at describing how to make this work, but right now I have more than a dozen saved searches trying to align ellipses to each other including quotation marks.

Here's a list of them, and I'm figuring out a way to simplify them.
  • ' …' to '…'
  • '. . .' to '… '
  • ' . . . ' to '…'
  • ' . . .' to '…'
  • '*…' to '…'
  • '([A-z])…([A-z])' to '\1… \2'
  • '‘… ' to '‘…'
  • '… ’' to '…’'
  • '‘. . . ' to '‘…'
  • ' . . .’' to '…’'
  • '“… ' to '“…'
  • '… ”' to '…”'
  • '‘ … ' to '‘…'
  • ' … ’' to '…’'
  • ' . . . ’' to '…’'
  • ' … ”' to '…”'
  • ' . . . ”' to '…”'
F4in7_ is offline   Reply With Quote
Old 02-24-2023, 01:50 AM   #2
hihohahi
Junior Member
hihohahi began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Feb 2023
Device: none
I have similar issue but with Arabic numbers. They randomly show up in my epub. I don't know any way to solve it so I made a simple macro AHK script and just run it every file.

Send, ^f ;press ctrl+F
Send {٢} ;type number 2 in arabic in FIND
Send {Enter} ;navigate to REPLACE
Send {Tab}
Send {2} ;type number 2
Send {Enter} ;navigate to REPLACE ALL
Send {Tab}
Sleep 300 ;wait 300 milisec inbetween otherwise it clicks too fast
Send {Tab}
Sleep 300
Send {Tab}
Sleep 300
Send {Tab}
Sleep 300
Send {Space}
Sleep 300
Send {Enter}
Sleep 300

The above is to replace ٢ with 2. repeat this for 0-9. it runs in about 20secs.
hihohahi is offline   Reply With Quote
Advert
Old 02-24-2023, 04:10 AM   #3
EbookMakers
Enthusiast
EbookMakers began at the beginning.
 
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
Quote:
Originally Posted by hihohahi View Post
I made a simple macro AHK script and just run it every file.
This macro executes a find/replace. It seems to me that you use it to memorize the find/replace. You could run the Search / Saved search command: it displays a window that also allows you to memorize find/replace between different sessions.

You can have 10 saved searches to successively make changes 0-9 with Replace all.

Example:
Add search in the Saved search window.
name: Arabic_to_Latin_number_2
Find: ٢
Replace : 2
Mode: Normal (or regex)
and click on: Done

Same for the other numbers.

You can successively select the saved searches and click on Replace all.
You can also select the 10 searches then click on Replace all, The 10 searches are executed successively. But I cannot guarantee that the result will be the same.

Last edited by EbookMakers; 02-24-2023 at 04:39 AM.
EbookMakers is offline   Reply With Quote
Old 02-24-2023, 04:22 AM   #4
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,706
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by F4in7_ View Post
I'm bad at describing how to make this work, but right now I have more than a dozen saved searches trying to align ellipses to each other including quotation marks.

Here's a list of them, and I'm figuring out a way to simplify them.
  • ' …' to '…'
    . . . <snip> . . .
  • ' . . . ”' to '…”'
Perhaps you could do something with this plugin -->> Editor Chains

Quote:
Automate various tasks in Calibre's ebook editor. Plugin allows to chain multiple actions together. You can choose from Plugin specific actions, calibre builtin actions, or create you own actions through the plugin's module editor.
Or the Editor's Regex function mode, see ==>> Function mode for Search & replace in the Editor

BR
BetterRed is offline   Reply With Quote
Old 02-24-2023, 05:31 AM   #5
lomkiri
Groupie
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 167
Karma: 1497966
Join Date: Jul 2021
Device: N/A
Quote:
Originally Posted by F4in7_ View Post
I'm bad at describing how to make this work, but right now I have more than a dozen saved searches trying to align ellipses to each other including quotation marks.

Here's a list of them, and I'm figuring out a way to simplify them.
If I understand correctly, you want to transform 3 dots (with or without spaces) in ellipsis, always remove a space before, and always remove a space after except if it's a letter.

You can do this with a regex-function :
Code:
— search : 
\s?(?:(?:\. ?\. ?\. ?)|…) ?(.)?

— replace (in mode regex-function):

def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    end = match[1] or ''   # be sure is not None
    space = ' ' if end.isalpha() else ''
    return '…' + space + end
I don't understand lines 2 and 5
> '…' to '… '
It seems that you want to put a space where everywhere else (except letter) you want to remove it ? I didn't consider this case because I didn't understand the purpose.

> '*…' to '…'
I don't understand this *. It is a jocker ? I didn't consider it either, but if you want to match a star, the search should be :
Code:
\s?\*?(?:(?:\. ?\. ?\. ?)|…) ?(.)?
___________
Edit 1: made optional the last char.
The cavit is that it will transform all ellipsis to themselves, but it shouldn't be a handicap in terms of time. If you don't want this, it can be avoid using 2 different regex-functions.

Edit 2: added how to remove a star

Last edited by lomkiri; 02-24-2023 at 06:37 AM.
lomkiri is offline   Reply With Quote
Advert
Old 02-24-2023, 05:56 AM   #6
lomkiri
Groupie
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 167
Karma: 1497966
Join Date: Jul 2021
Device: N/A
Quote:
Originally Posted by hihohahi View Post
I have similar issue but with Arabic numbers. They randomly show up in my epub. I don't know any way to solve it so I made a simple macro AHK script and just run it every file.
You can do it with a "replace all" and a simple regex-functon:

Code:
search: 
[\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669]

Replace: 
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    str_in  = '\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669'
    str_out = '0123456789'
    table = str.maketrans(str_in, str_out)
    return match[0].translate(table)
lomkiri is offline   Reply With Quote
Old 02-24-2023, 12:05 PM   #7
enuddleyarbl
Guru
enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.
 
enuddleyarbl's Avatar
 
Posts: 776
Karma: 1538394
Join Date: Sep 2013
Device: Kobo Forma
Quote:
Originally Posted by lomkiri View Post
If I understand correctly, you want to transform 3 dots (with or without spaces) in ellipsis, always remove a space before, and always remove a space after except if it's a letter.

You can do this with a regex-function :
Code:
— search : 
\s?(?:(?:\. ?\. ?\. ?)|…) ?(.)?

— replace (in mode regex-function):

def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    end = match[1] or ''   # be sure is not None
    space = ' ' if end.isalpha() else ''
    return '…' + space + end
It's been on my agenda to try to standardize the various ellipses I run across in books. Your Search, above, pretty much solves that for me. But, for my use, I find that I don't need the Regex-Function. With a minor modification to your search, a plain old Replace works for what I need:
Code:
Search: \s?(?:(?:\.\s?\.\s?\.\s?)|…)(\s?.)? (<-- EDIT: I've replaced the spaces with \s because sometimes they use non-breaking-spaces and plain space didn't pick it up)
Replace (as Regex): …\1
For the Search, all I've done is moved the opening parenthesis of your 2nd group (well, the only "selecting" group) to include the space if present. For the Replace, I just replace whatever form of spaces and ellipses the book already has, with just an actual ellipsis and tack on whatever the book currently has following it.

The problem with ellipses is that there doesn't seem to be any hard-and-fast rules for them. The best I've found (and there are a lot of contradictions) are:
  1. if it's the old-fashioned 3-dot ellipse with spaces between the dots, then there should be spaces before and after the ellipse.
  2. If the 3-dots are without intervening spaces, then there's shouldn't be spaces before or after.
  3. sentences ending with elllipses and a period or comma should have the period/comma first and then the ellipsis.
  4. sentences ending with question or exclamation marks should have the ellipsis first and then the question/exclamation mark.
Personally, I think rule 3, above, is for the birds. In general, trailing ellipses seem to be for thoughts trailing off. Not for sentences trailing off (which is what the rule implies to me). And, for rules 1 and 2, we're replacing those 3 dots with an actual ellipsis, so they don't really apply.

Soft rules I've found for ellipses seem to say there should always be spaces before and after unless one bumps up against a closing quote. In that case, the ellipsis bumps right up to the closing quote.

But, most of what I see in actual books have opening ellipses right up against the start of sentences, closing ellipses right up against the end of the sentences, and embedded ellipses adjacent to the previous bit of sentence and having a space before the sentence resumes (I think).

And, that's why I changed your search and replace. If I've worked through it correctly, all I've got it doing is stripping off any leading spaces, replacing any 3-dot ellipses with real ellipses and using whatever trailing spaces the author/publisher decided to use.

That seems to do what I need. So, thanks for the regex. It sure made things easier for me.

EDIT: and what is it with Pratchett and ellipses? I just opened a book of his to test this and there are 828 ellipses in the book.

Last edited by enuddleyarbl; 02-24-2023 at 12:54 PM.
enuddleyarbl is offline   Reply With Quote
Old 02-24-2023, 02:25 PM   #8
lomkiri
Groupie
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 167
Karma: 1497966
Join Date: Jul 2021
Device: N/A
The regex-function was necessary because the OP was asking to put a space if the following char was a letter, and no space elsewhere. But if you decide to keep it as the editor wrote it (with or without a space), you're right, no need for of a function.

But if the purpose is to unify the notations, maybe it's good to decide when to put or to strip out a space? In this case, a function can make the job (for example, if we want to strip spaces between ellipsis and quotation marks, and to force a space elsewhere).

You're also perfectly right about \s instead spaces, I should have done it, but as I was replying to his specific request…

Anyway, I'm glad I could help you with this regex.

Last edited by lomkiri; 02-24-2023 at 02:34 PM.
lomkiri is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
using regular expressions occum Conversion 3 02-02-2020 04:54 PM
Regular expressions G2B Conversion 7 03-09-2018 02:39 PM
Regular Expressions help deamonfruba Library Management 2 06-02-2012 02:09 AM
Another help with regular expressions encapuchado Library Management 6 06-21-2011 03:14 PM
Help with regular expressions jevonbrady Library Management 6 06-21-2011 10:16 AM


All times are GMT -4. The time now is 12:39 PM.


MobileRead.com is a privately owned, operated and funded community.