Ok, your initial explanation was really unclear, I understood that the sentence you wanted to select was
<p class="calibre8"> <span class="calibre3">« A long sentence », said the man. « Then a second part. »</span> </p>
and I made the regex for it
(<p class="calibre8"> <span class="calibre3">)(?<! — )(«[^»]+»,.*?\.) (« )
BUT
let see if I really get what you mean : I understand that you want to target this sentence
<p class="calibre8"> <span class="calibre3">He came through the door. « I'm here », he said.</span> </p>
and transform it to
<p class="calibre8"> <span class="calibre3">He came through the door.</span></p>
<p class="calibre8"> <span class="calibre3"> — « I'm here », he said.</span> </p>
but not this one :
<p class="calibre8"> <span class="calibre3"> — « Sentence ending with a comma, » said the man. « Then a second part. »</span> </p>
Is it OK ? (as said pdurrant, you should have done you requests in this way, with examples and counter-examples, so it's much easier to understand.)
In that case, I'll do a little different, with the help of a regex-function.
The regex will be :
Code:
(<p class="calibre8"> <span class="calibre3">)(\s—\s«.+?,\s»\s)?([^.]+.) («)
Explanation :
group 1 : <p class="calibre8"> <span class="calibre3">
group 2 : \s—\s« bla,\s»\s
(note : I take that this dialog must end with ", »", if it is " »," it won't be selected in the group 2 and the paragraph will be split. If you don't want this (i.e. mandatory comma before the quote), change the regex accordingly
group 3 : The sentence before the next quote (shall the group 2 exist or not)
not in any group (won't be kept if we split) : <space>
group 4 : «
The group 2 can be missing since it is in the form (expr)?. Then, if missing, match[2] will be empty in the function, so this value will be tested to know if the line has to be split (group 2 empty) or not.
You said that you may have some  , so I put \s instead of <space>, it matches all types of spaces
"Dot all" must be unchecked (in french : Le point correspond à tout)
OLD CODE, DON'T USE IT: (see why in my next post)
The function, auto-explicative (comments begin with #), is:
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
if match[2]:
# match[2] (the group 2) is " — « blabla, » "
# If we have a group 2 (match[2] not empty) don't do anything (match[0] is the whole selection)
return match[0]
else:
# We don't have a group 2, so the paragraph must be split
# (the paragraph was selected by the regex, so we have a dialog in it)
# match[1] is the html code for the beginning of the paragraph
return match[1] + match[3] +'</span></p>\n\n ' + match[1] + ' — ' + match[4]
# or, if you want only a line break :
# return match[1] + match[3] + '<br/>' + ' — ' + match[4]
NEW VERSION TO BE USED:
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
import regex
abort_if_starting_with_emdash = regex.match('\\s—', match[3])
if match[2] or abort_if_starting_with_emdash:
return match[0]
else:
return match[1] + match[3] +'</span></p>\n\n ' + match[1] + ' — ' + match[4]
To execute this function, select "regex-function" in the drop down (instead of "regex"), click on Create/Edit and past the text of the function.
You've got the idea, I guess you will be able to adapt it if you have some slightly different needs.
Since you seem to need to make some complex substitutions, I guess you should find a tuto for using regexes, there is a quite good one in the help in the site of calibre. The site I've given the URL in my first message is a reference, not a tuto, it is not for learning the basis.
The site
https://regex101.com may help you to construct your regexes (select PCRE as a flavor)