Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 02-27-2024, 08:58 AM   #16
reinsley
Connoisseur
reinsley began at the beginning.
 
reinsley's Avatar
 
Posts: 65
Karma: 10
Join Date: Dec 2016
Location: France
Device: Kindle PaperWhite
Quote:
Originally Posted by lomkiri View Post
Ok, your initial explanation was really unclear
Shame on me, I answered two helpers in the same post and got tangled up in my explanations. That said, asking a question is a good way to clear one's head, but I'll tidy up before posting.

Quote:
Originally Posted by lomkiri View Post
BUT

let see if I really get what you mean : I understand that you want to target this sentence
<p class="calibre8"> <span class="calibre3">He came through the door. « I'm here », he said.</span> </p>

and transform it to
<p class="calibre8"> <span class="calibre3">He came through the door.</span></p>
<p class="calibre8"> <span class="calibre3"> — « I'm here », he said.</span> </p>


but not this one :
<p class="calibre8"> <span class="calibre3"> — « Sentence ending with a comma, » said the man. « Then a second part. »</span> </p>

Is it OK ?
Right.

Quote:
Originally Posted by lomkiri View Post
You've got the idea, I guess you will be able to adapt it if you have some slightly different needs.
Since you seem to need to make some complex substitutions, I guess you should find a tuto for using regexes, there is a quite good one in the help in the site of calibre. The site I've given the URL in my first message is a reference, not a tuto, it is not for learning the basis.
The site https://regex101.com may help you to construct your regexes (select PCRE as a flavor)
I need to do my homework to perfect the formula with regex101 and the other reference site.


I'm on the right track.
FYI : The nbsp don't take the \s into account. I'll normalize all the text before the regex.
Issue : A line starting with em dash is selected with the following regex.
Lines starting with text are nicely formatted when they are enlighted.

Here's the search : (<p class="calibre8"> <span class="calibre3">)(\s—\s«.+?,\s»\s)?([^.]+.) (« )
and replace : \1\2\3</span> </p>
<p class="calibre8"> <span class="calibre3"> — «
in regular expression mode that find the emdash.


some examples put in calibre html page :

<body>
<p class="calibre8"> <span class="calibre3">He came through the door. « I'm here », he said.</span> </p>
<p>comment : I need to find and select . «</p>
<span class="calibre3">*</span>
<p class="calibre8"> <span class="calibre3"> — « Sentence ending with a comma, » said the man. « Then a second part. »</span> </p>
<p>comment : no need to find and select . «</p>
<span class="calibre3">*</span>
<p class="calibre8"> <span class="calibre3"> — «*Je répète donc ma question*», reprit le directeur, faussement calme. « L’information vous parait-elle authentique*?*»</span> </p>
<p>comment : no need to find and select . «</p>
<span class="calibre3">*</span>
<p class="calibre8"> <span class="calibre3">Axel échangea un regard en coin avec les autres. « Bon, maintenant que nous sommes là, autant aller jeter un coup d’œil sur place, non*?*»</span> </p>
<p>comment : I need to find and select . «</p>
<span class="calibre3">*</span>
<p class="calibre8"> <span class="calibre3">Le médecin esquissa une grimace. « Je dirais une quinzaine d’heures. La nuit dernière, sans doute. Ce matin, au plus tard.*»</span> </p>
<p>comment : I need to find and select . «</p>
<span class="calibre3">*</span>
<p class="calibre8"> <span class="calibre3">Le patron fut le premier à répondre. « Les deux adolescents cela ne fait aucun doute.*»</span> </p>
<p>comment : I need to find and select . «</p>
<span class="calibre3">*</span>
<p class="calibre8"> <span class="calibre3"> — « Donc la voie vous semble la plus vraisemblable, c’est ça*?*», reprit le conseiller à la sécurité.</span></p>
<p>comment : no need to find and select . «</p>
</body>


the CSS :
.calibre3 {
font-size: 1em
}
.calibre8 {;
margin-bottom: 0%;
margin-top: 0%;
text-align: justify
}

Thank you very much fo the the follow-up. Best regards.
reinsley is offline   Reply With Quote
Old 02-27-2024, 10:04 AM   #17
lomkiri
Zealot
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 136
Karma: 1000102
Join Date: Jul 2021
Device: N/A
====== Important ======

I realized that a paragraph starting with " — «" and with no further dialog will be split, that is not wanted (and thus you cannot pass the function twice without bad effects). So the new function is :

Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    import regex

    # Moodify this if you change the form the dialog begins
    abort_if_starting_with_emdash = regex.match('\\s—', match[3])
    
    if match[2] or abort_if_starting_with_emdash:
        return match[0]
    
    else:
        return match[1] + match[3] +'</span></p>\n\n  ' + match[1] + ' — ' + match[4]
This one can be executed twice, and doesn't split either a paragraph with only one dialog.

Anyway, it is advisable to always make a backup of your file when you apply new regexes in "replace all" mode. And to check the result with the button "See what was changed".

Another considerations :
— in french, you need a non-break-space or a narrow-non-break-space after "«" and before "»"
— It is better (but not mandatory) to avoid spaces at the beginning and end of paragraph (so <p class="calibre8"> <span class="calibre3"> — « blabla »</span> </p> becomes <p class="calibre8"><span class="calibre3">— « blabla »</span></p>. But apply first the function, because it is based on the form with spaces, or modify the regex (and the variable "abort_if_starting_with_emdash" in the function).

Last edited by lomkiri; 02-27-2024 at 10:20 AM.
lomkiri is offline   Reply With Quote
Old 02-27-2024, 12:51 PM   #18
reinsley
Connoisseur
reinsley began at the beginning.
 
reinsley's Avatar
 
Posts: 65
Karma: 10
Join Date: Dec 2016
Location: France
Device: Kindle PaperWhite
Well, my first regex function... I thought the regular expression mode was a Himalaya.

I won't prolong the suspense. Your function works like clockwork, like a Swiss cuckoo clock. It's a marvel.

I have no words to thank you and pdurrant of course. Your function does the job.

I need to get my nose into the Python language.

Have a nice day.


Nota Bene: you're right about the non-break-space after and before "«" "»", I was thinking of removing them for processing and then putting them back. I'll keep the good idea of checking the changes.
reinsley is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Trying to set 'Beginning'/start reading page in AZW3 Siavahda Conversion 4 04-07-2023 04:50 PM
Unable to set new hotkeys for jumping to the beginning/end YogSothoth Viewer 2 11-18-2022 10:10 PM
Avoid pdf header and footer in the beginning of chapters alexandreaquiles Conversion 0 10-09-2014 03:02 PM
How to set Beginning page for Kindle mjlamb Kindle Formats 5 07-13-2014 07:59 PM
How to set Kindle "Go to Beginning" marker? timfrost Conversion 0 05-17-2011 10:28 AM


All times are GMT -4. The time now is 09:20 AM.


MobileRead.com is a privately owned, operated and funded community.