View Single Post
Old 02-27-2024, 02:09 AM   #15
lomkiri
Groupie
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 167
Karma: 1497966
Join Date: Jul 2021
Device: N/A
Ok, your initial explanation was really unclear, I understood that the sentence you wanted to select was
<p class="calibre8"> <span class="calibre3">« A long sentence », said the man. « Then a second part. »</span> </p>
and I made the regex for it
(<p class="calibre8"> <span class="calibre3">)(?<! — )(«[^»]+»,.*?\.) (« )

BUT

let see if I really get what you mean : I understand that you want to target this sentence
<p class="calibre8"> <span class="calibre3">He came through the door. « I'm here », he said.</span> </p>

and transform it to
<p class="calibre8"> <span class="calibre3">He came through the door.</span></p>
<p class="calibre8"> <span class="calibre3"> — « I'm here », he said.</span> </p>


but not this one :
<p class="calibre8"> <span class="calibre3"> — « Sentence ending with a comma, » said the man. « Then a second part. »</span> </p>

Is it OK ? (as said pdurrant, you should have done you requests in this way, with examples and counter-examples, so it's much easier to understand.)

In that case, I'll do a little different, with the help of a regex-function.

The regex will be :

Code:
(<p class="calibre8"> <span class="calibre3">)(\s—\s«.+?,\s»\s)?([^.]+.) («)
Explanation :
group 1 : <p class="calibre8"> <span class="calibre3">
group 2 : \s—\s« bla,\s»\s
(note : I take that this dialog must end with ", »", if it is " »," it won't be selected in the group 2 and the paragraph will be split. If you don't want this (i.e. mandatory comma before the quote), change the regex accordingly
group 3 : The sentence before the next quote (shall the group 2 exist or not)
not in any group (won't be kept if we split) : <space>
group 4 : «

The group 2 can be missing since it is in the form (expr)?. Then, if missing, match[2] will be empty in the function, so this value will be tested to know if the line has to be split (group 2 empty) or not.

You said that you may have some &nbsp, so I put \s instead of <space>, it matches all types of spaces

"Dot all" must be unchecked (in french : Le point correspond à tout)

OLD CODE, DON'T USE IT: (see why in my next post)
The function, auto-explicative (comments begin with #), is:
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    if match[2]:
        # match[2] (the group 2) is " — « blabla, » "
        # If we have a group 2 (match[2] not empty) don't do anything (match[0] is the whole selection)
        return match[0]

    else:
	# We don't have a group 2, so the paragraph must be split
	# (the paragraph was selected by the regex, so we have a dialog in it)
	# match[1] is the html code for the beginning of the paragraph
        return match[1] + match[3] +'</span></p>\n\n  ' + match[1] + ' — ' + match[4]

        # or, if you want only a line break : 
        # return match[1] + match[3] + '<br/>'  + ' — ' + match[4]
NEW VERSION TO BE USED:
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    import regex
    abort_if_starting_with_emdash = regex.match('\\s—', match[3])
    
    if match[2] or abort_if_starting_with_emdash:
        return match[0]
    
    else:
        return match[1] + match[3] +'</span></p>\n\n  ' + match[1] + ' — ' + match[4]
To execute this function, select "regex-function" in the drop down (instead of "regex"), click on Create/Edit and past the text of the function.

You've got the idea, I guess you will be able to adapt it if you have some slightly different needs.
Since you seem to need to make some complex substitutions, I guess you should find a tuto for using regexes, there is a quite good one in the help in the site of calibre. The site I've given the URL in my first message is a reference, not a tuto, it is not for learning the basis.
The site https://regex101.com may help you to construct your regexes (select PCRE as a flavor)

Last edited by lomkiri; 02-27-2024 at 10:11 AM. Reason: Correction of the function
lomkiri is offline   Reply With Quote