Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 07-02-2020, 10:10 PM   #31
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 40,732
Karma: 18247461
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
That's because builtin function are not simple standalone functions you can learn from, they use other calibre code, however if you want to see them look in function_replace.py in the calibre source code.
kovidgoyal is offline   Reply With Quote
Old 12-01-2020, 07:10 PM   #32
EbookMakers
Enthusiast
EbookMakers began at the beginning.
 
Posts: 25
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
Automatically fill in the <title> tag of text pages

In the <head> section, the absence of a <title> tag causes an epubcheck error. It also happens to find something like (depends on the language):
<title>Unknown</title> or <title></title>

In these cases, the regex-function will look for the title in the metadata to fill in the <title> tag of the <head> sections of the xhtml pages. If the title in the metadata is not filled in or itself has the default value “Unknown”, the function leaves it as is. You can then fill in the <dc: title> tag in the opf, save the epub, re-open it in the editor and then restart the regex-function.

The function is commented out. You must adapt the regex and the function to the language of the epub if it is not English or French to add the equivalent word to “Unknown”.

The regex :

Code:
<title>(?:[Ii]nconnu\(e\)|[Uu]nknown)?</title>|<head>(?:(?!<title).)+\K(</head>)
Dot matches all (new lines).

The function :

Code:
# execute the function with this regex : 
# <title>(?:[Ii]nconnu\(e\)|[Uu]nknown)?</title>|<head>(?:(?!<title).)+\K(</head>)
# Dot matches all

def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    # Funct-regex to fill in (with <dc:title> of the opf) a <title> tag in xml files
    # if there is no such tag or if it's 'Unknown', or localized equivalent


    # This tuple and the regex should be adapted to the language of the epub
    # Add in this tuple the string to target, in your language
    #  +++ Must be in lower case +++, since the string in the test is lowered
    no_title =  ('unknown', 'inconnu(e)')


    # 'is_dc_title' is true if metada.title is defined
    # Warning : if no <dc:title> in the opf, metada.title will take
    # the value 'Unknown' or its localized value (ex : Inconnu(e) for french)
    is_dc_title =  ( metadata.title is not None \
            and len(metadata.title) > 0         \
            and metadata.title.lower() not in no_title )

    # no capturing group : <title> is empty or 'Unknown'
    # (we capture a group only if we reach </head> without finding <title>)
    if not match.group(1):
        if is_dc_title:
            title = "  <title>" + metadata.title + "</title>"
        else:
            title = match.group()

    # found (</head>), thus <title> tag is missing 
    else:
        if is_dc_title:
            title = "  <title>" + metadata.title + "</title>"  + '\n' + match.group(1)
        else:
            title =  match.group(1)
            ######## Shall we fill in a tag if none ? ###########
            # comment/uncomment this line below if you want to write <title></title>
            # in case tag <title> is missing and <dc:title> is not defined 
            # if commented, tag will be still missing
            # title = "  <title></title>\n"  + title

    return title

Last edited by EbookMakers; 12-08-2020 at 04:31 AM.
EbookMakers is offline   Reply With Quote
Advert
Old 09-22-2021, 06:11 PM   #33
Ted Friesen
Enthusiast
Ted Friesen began at the beginning.
 
Posts: 42
Karma: 10
Join Date: May 2016
Device: Kindle
Quote:
Originally Posted by kovidgoyal View Post
That's because builtin function are not simple standalone functions you can learn from, they use other calibre code, however if you want to see them look in function_replace.py in the calibre source code.
Found them. Thanks you.
Ted Friesen is offline   Reply With Quote
Old 10-17-2021, 05:19 PM   #34
firsikov
Junior Member
firsikov began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Oct 2021
Device: Kindle Voyager
Hi there!

I have code like that:
Code:
<p class="text">some text</p>
<blockquote class="email">
   <p class="text">some <i>text</i></p>
   <p class="text"><b>some</b> text</p>
   <p class="text">some text</p>
</blockquote>
<p class="text">some text</p>
And i need to change class of P tag only inside of blockquote tag.

Regex search string, like "<blockquote class="email">(.*?)<p class="text">(.*?)</blockquote>" work only if blockquote have only one p tag.

I don't understand, how to create correct search string or even how to ask google for it.

Hope for your help. Thanks.
firsikov is offline   Reply With Quote
Old 10-17-2021, 06:09 PM   #35
Brett Merkey
Not Quite Dead
Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.
 
Posts: 181
Karma: 654170
Join Date: Jul 2015
Device: Paperwhite 4; Galaxy Tab
@firsikov:
Using regex can be difficult and tricky at times. Consider another way to accomplish the same thing, without changing the HTML code.

What you seem to want is to control the look of text within a particular type of blockquote. Consider using CSS contextual styles:

Quote:
blockquote.email p.text {put your desired styles here}
This form of style selector will only affect the precise text you want to change.
Brett Merkey is offline   Reply With Quote
Advert
Old 10-18-2021, 05:21 AM   #36
firsikov
Junior Member
firsikov began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Oct 2021
Device: Kindle Voyager
Quote:
Originally Posted by Brett Merkey View Post
@firsikov:
Using regex can be difficult and tricky at times. Consider another way to accomplish the same thing, without changing the HTML code.

What you seem to want is to control the look of text within a particular type of blockquote. Consider using CSS contextual styles:

Code:
blockquote.email p.text {put your desired styles here}
This form of style selector will only affect the precise text you want to change.
This was the first thought that came to my mind. But for some reason it didn't help. The point is that the p.text style is already defined in the stylesheet earlier in a different way. It may be necessary to use Python here, but I'm not a programmer.
firsikov is offline   Reply With Quote
Old 10-18-2021, 06:02 AM   #37
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,252
Karma: 45541596
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by firsikov View Post
Hi there!

I have code like that:
Code:
<p class="text">some text</p>
<blockquote class="email">
   <p class="text">some <i>text</i></p>
   <p class="text"><b>some</b> text</p>
   <p class="text">some text</p>
</blockquote>
<p class="text">some text</p>
And i need to change class of P tag only inside of blockquote tag.

Regex search string, like "<blockquote class="email">(.*?)<p class="text">(.*?)</blockquote>" work only if blockquote have only one p tag.

I don't understand, how to create correct search string or even how to ask google for it.

Hope for your help. Thanks.
Just to check, you want to change the class of the paragraph within the blockquote from "text" to something else? If so, it does work, but, you need to have the "Dot all" option selected. And the replace string is:

Code:
<blockquote class="email">\1<p class="newclassname">\2</blockquote>
But, I would do this with:

Code:
(<blockquote class="email">.*?<p class=")text(">.*?</blockquote>)
And the substitution is:

Code:
\1newclassname\2
Which is basically to use all the put the constant bits into groups and the bit you want to change not selected. This also needs the "Dot all" option to be selected.

For both, you have to run them multiple times. If you do "Replace all", it only makes one change in each blockquote.
davidfor is offline   Reply With Quote
Old 10-18-2021, 06:44 AM   #38
lomkiri
Connoisseur
lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.
 
Posts: 73
Karma: 107742
Join Date: Jul 2021
Device: N/A
Quote:
Originally Posted by davidfor View Post
For both, you have to run them multiple times. If you do "Replace all", it only makes one change in each blockquote.
It is possible to do everything on one pass, with replace all, using a regex-function
Select the mode "regex-function"

Your "find" field is :
(<blockquote class="email">)(.*?)(</blockquote>)

Create the regex-function with this code:
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):  
    inside_block = match.group(2).replace(
                '<p class="text"',
                '<p class="newtext"')
    return match.group(1) + inside_block + match.group(3)
Obviously, you have to replace "text" and "newtext" with the names you need.
Be careful, there is a simple quote after the double quote in "text"' and newtext"', it is mandatory to keep it.

Then, you can go to "replace all"
lomkiri is offline   Reply With Quote
Old 10-18-2021, 08:30 AM   #39
firsikov
Junior Member
firsikov began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Oct 2021
Device: Kindle Voyager
Quote:
Originally Posted by davidfor View Post
Just to check, you want to change the class of the paragraph within the blockquote from "text" to something else? If so, it does work, but, you need to have the "Dot all" option selected. And the replace string is:

Code:
<blockquote class="email">\1<p class="newclassname">\2</blockquote>
But, I would do this with:

Code:
(<blockquote class="email">.*?<p class=")text(">.*?</blockquote>)
And the substitution is:

Code:
\1newclassname\2
Which is basically to use all the put the constant bits into groups and the bit you want to change not selected. This also needs the "Dot all" option to be selected.

For both, you have to run them multiple times. If you do "Replace all", it only makes one change in each blockquote.
Thanks a lot. It works. But! If you have code like that:
Code:
<p class="text">some text</p>
<blockquote class="email">
   <p class="text">some <i>text</i></p>
   <p class="text"><b>some</b> text</p>
   <p class="text">some text</p>
</blockquote>
<p class="text">some text</p>
<p class="text">some text</p>
<blockquote class="email">
   <p class="text">some <i>text</i></p>
   <p class="text"><b>some</b> text</p>
   <p class="text">some text</p>
</blockquote>
<p class="text">some text</p>
First opened <blockquote> and last closed </blockquote> will work too. And when i finish replace all, result will be like that:
Code:
<p class="text">some text</p>
<blockquote class="email">
   <p class="newclass>some <i>text</i></p>
   <p class="newclass><b>some</b> text</p>
   <p class="newclass>some text</p>
</blockquote>
<p class="newclass>some text</p>
<p class="newclass>some text</p>
<blockquote class="email">
   <p class="newclass>some <i>text</i></p>
   <p class="newclass><b>some</b> text</p>
   <p class="newclass>some text</p>
</blockquote>
<p class="text">some text</p>
firsikov is offline   Reply With Quote
Old 10-18-2021, 09:32 AM   #40
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,252
Karma: 45541596
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by firsikov View Post
Thanks a lot. It works. But! If you have code like that:
Code:
<p class="text">some text</p>
<blockquote class="email">
   <p class="text">some <i>text</i></p>
   <p class="text"><b>some</b> text</p>
   <p class="text">some text</p>
</blockquote>
<p class="text">some text</p>
<p class="text">some text</p>
<blockquote class="email">
   <p class="text">some <i>text</i></p>
   <p class="text"><b>some</b> text</p>
   <p class="text">some text</p>
</blockquote>
<p class="text">some text</p>
First opened <blockquote> and last closed </blockquote> will work too. And when i finish replace all, result will be like that:
Code:
<p class="text">some text</p>
<blockquote class="email">
   <p class="newclass>some <i>text</i></p>
   <p class="newclass><b>some</b> text</p>
   <p class="newclass>some text</p>
</blockquote>
<p class="newclass>some text</p>
<p class="newclass>some text</p>
<blockquote class="email">
   <p class="newclass>some <i>text</i></p>
   <p class="newclass><b>some</b> text</p>
   <p class="newclass>some text</p>
</blockquote>
<p class="text">some text</p>
Sorry, I didn't run it all the way through. Just ran enough replace alls to check it was making the changes you wanted. But, it will eventually match across the blockquotes. You need to block that, but I can't think of how to do that at the moment.
davidfor is offline   Reply With Quote
Old 10-18-2021, 11:58 AM   #41
lomkiri
Connoisseur
lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.
 
Posts: 73
Karma: 107742
Join Date: Jul 2021
Device: N/A
Quote:
Originally Posted by firsikov View Post
hanks a lot. It works. But! If you have code like that:[…]
Just a question: why don't you want to use my solution? It works like a charm on the example you gave, in one shot on the whole book (or on the page, if you chose "current file")

Screenshot of the diff-screen: https://i.imgur.com/v5oxiFy.jpeg
lomkiri is offline   Reply With Quote
Old 10-18-2021, 12:24 PM   #42
firsikov
Junior Member
firsikov began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Oct 2021
Device: Kindle Voyager
Quote:
Originally Posted by lomkiri View Post
Just a question: why don't you want to use my solution? It works like a charm on the example you gave, in one shot on the whole book (or on the page, if you chose "current file")

Screenshot of the diff-screen: https://i.imgur.com/v5oxiFy.jpeg
This is weird. At first I tried to do as you told me, and nothing worked for me. The text in the nested tag continued to display with the old characteristics.

But now, when I wanted to take a screenshot to show you, everything worked as i need!

Anyway, thanks for the advice.
firsikov is offline   Reply With Quote
Old 10-18-2021, 01:02 PM   #43
lomkiri
Connoisseur
lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.
 
Posts: 73
Karma: 107742
Join Date: Jul 2021
Device: N/A
Quote:
Originally Posted by firsikov View Post
This is weird. At first I tried to do as you told me, and nothing worked for me..
Maybe because "Dot all" wasn't checked? The group 2 of the regex must be able to pass through newlines (dot all means that (.*?) won't stop at next newline).
My mistake if it's the case, I should have told you to check it, but as Davidfor already said it, I thought it was unnecessary to repeat it.

Anyway, it's fine if your problem is solved :-).
lomkiri is offline   Reply With Quote
Old 10-19-2021, 12:00 PM   #44
firsikov
Junior Member
firsikov began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Oct 2021
Device: Kindle Voyager
Quote:
Originally Posted by lomkiri View Post
Maybe because "Dot all" wasn't checked? The group 2 of the regex must be able to pass through newlines (dot all means that (.*?) won't stop at next newline).
My mistake if it's the case, I should have told you to check it, but as Davidfor already said it, I thought it was unnecessary to repeat it.

Anyway, it's fine if your problem is solved :-).
Suddenly I found a great tool. Adobe Dreamweaver. Unfortunately it's not free. It would be cool if Mr. Kovid added such a tool to his editor.

firsikov is offline   Reply With Quote
Old 10-19-2021, 03:48 PM   #45
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,357
Karma: 22014947
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by firsikov View Post
Suddenly I found a great tool. Adobe Dreamweaver. Unfortunately it's not free. It would be cool if Mr. Kovid added such a tool to his editor.
There's a plugin by DiapDealer that offers similar options: Diap's Editing Toolbag
Doitsu is offline   Reply With Quote
Reply

Tags
conversion, errors, function, ocr, spelling

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
About saved searches and regex Carpatos Editor 22 09-30-2020 10:56 PM
Regex-Functions - getting user input CalibUser Editor 8 09-09-2020 04:26 AM
Difference in Manual Search and Saved Search phossler Editor 4 10-04-2015 12:17 PM
Help - Learning to use Regex Functions weberr Editor 1 06-13-2015 01:59 AM
Limit on length of saved regex? ElMiko Sigil 0 06-30-2013 03:32 PM


All times are GMT -4. The time now is 04:08 AM.


MobileRead.com is a privately owned, operated and funded community.