![]() |
#16 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,388
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The way those functions work is that they uppercase the contents of any groups in the find expression. You have specified a group that matches H1. You need to specify a group that matches the actual content, like this.
<[Hh][1-6]>(.+?)</[Hh][1-6]> If you want a case changing function that ignores text in tag definitions in the matched text, then you will need to write one for yourself. The builtin functions wont do that, because, they are for general purpose use, not specifically for changing text between tags. |
![]() |
![]() |
![]() |
#17 | |||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,087
Karma: 447222
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
Thanks for your patience and the explanations
Quote:
Using your Find works exactly as advertised and it correctly finds and highlights the Hx tags. Quote:
1. It replaces tag markers ('<' and '>') with what is treated like normal text 2. It does not TitleCase the text that it does find Quote:
So I assume that <[Hh][1-6]>(.+?)</[Hh][1-6]> would make the \1 group for the Replace just the red text in the Before below? Before: Code:
<h1>TEST1 TEST1 TEST1 TEST1 TEST1 </h1> <p>NOW IS THE TIME and this should remain mixed case</p> <h1>TEST2 TEST2 TEST2 <br/><br/>TEST3 TEST3 </h1> <p>NOW IS THE TIME and this should remain mixed case</p> <h1>TEST4 <i>TEST4 TEST4 TEST4</i> TEST4 </h1> After: Code:
<h1>Test1 Test1 Test1 Test1 Test1 </h1> <p>NOW IS THE TIME and this should remain mixed case</p> <h1>TEST2 TEST2 TEST2 <br/><br/>TEST3 TEST3 </h1> <p>NOW IS THE TIME and this should remain mixed case</p> <h1>TEST4 <i>TEST4 TEST4 TEST4</i> TEST4 </h1> 2. I don't understand why the same logic isn't applied to the second and third so that all text between the Hx's is made title case, as well as why the replacement of < and > with entities which end up being treated like normal text |
|||
![]() |
![]() |
![]() |
#18 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,388
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The logic is simple:
*Everything* that matches the expression inside the brackets is made upper case. Furthermore, the function treats all that text as plain text, not a mix of HTML and plain text. That means that because the output of the function is being put into an HTML file < and > get replaced by entities. Or in other words, that function is not designed to be used in the way you are trying to use it. You need to come up with a function that understands that it could be operating on a mixture of HTML tags and plain text and so restricts itself to only the plain text parts. |
![]() |
![]() |
![]() |
#19 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,388
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I have created a builtin function for you that does that, in the next release.
https://github.com/kovidgoyal/calibr...151ff7a9946577 |
![]() |
![]() |
![]() |
#20 |
Interested in the matter
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 421
Karma: 426094
Join Date: Dec 2011
Location: Spain, south coast
Device: Pocketbook InkPad 3
|
Paul,
What it seeks this expression: <[Hh] [1-6]>(.+?)</ [Hh] [1-6]>, is: <one h (or H) followed by a number (1 to 6)>anything</ another h followed by another number> Here: <h followed by one number> anything </ br or </i br or i is not one h followed by a number. I recommend that if you want to use regex, visit this website: http://www.regular-expressions.info/tutorial.html |
![]() |
![]() |
![]() |
#21 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,087
Karma: 447222
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
@kovid -- THANKS!!!! I can see I'll have to learn at least a little python
I was confused by the apparent different treatment of the TitleCase function between the first (simplest) sentence "Where It Worked Just Fine" and the second and third where IT LEFT EVERYTHING IN UPPER CASE @jbacelar -- The Find Kovid gave me seems to work fine. It would select all this H1 text, including the <h1> and </h1> ... <h1>TEST2 TEST2 TEST2 <br/><br/>TEST3 TEST3 </h1> After the Replace <h1>TEST2 TEST2 TEST2 <br/><br/>TEST3 TEST3 </h1> What was confusing me was that the text was not in title case. I understand the replaced entities now I believe that Kovid's new built-in function is the only way to handle these types of cases |
![]() |
![]() |
![]() |
#22 |
Nameless Being
|
Title-case text built-in function
I'm also having trouble with the "Title-case text (ignore tags)" built-in function. I've wrapped all the UPPER case text that I want to convert to Title case in <h2> tags and am using the search parameter "(?s)<h\d>(.+?)</h\d>".
Applying "Replace-all" results in a deletion of all H tags and the intervening text. No conversion just deletion. Editing the built-in function, this is what I see: def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): return '' Shouldn't there be more to it? |
![]() |
![]() |
#23 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,388
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
![]() |
![]() |
![]() |
#24 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,087
Karma: 447222
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
@Ted - Your PM
I actually run two steps: one to upper case headings, and then a second to title case them These are my saved searches and this is the function listing for 'Title case text - Ignore tags' Code:
from calibre.utils.titlecase import titlecase from calibre.ebooks.oeb.polish.utils import apply_func_to_html_text def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): '''Title-case matched text, ignoring the text inside tag definitions.''' return apply_func_to_html_text(match, titlecase) |
![]() |
![]() |
![]() |
#25 |
Nameless Being
|
Title-case text built-in function
Thanks Paul it worked!!
Question: When I Create/Edit built-in functions is there supposed to be some code there? |
![]() |
![]() |
#26 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,087
Karma: 447222
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
I have mine as a Saved Search, but you can also do it ad hoc
There is code there that defines the function, but I don't know Python so I never created any I guess you can create your own function The Calibre Users' Manual is one of the best I've seen in a long time: https://manual.calibre-ebook.com/function_mode.html Last edited by phossler; 06-29-2020 at 07:25 PM. |
![]() |
![]() |
![]() |
#27 |
Nameless Being
|
Thanks for the code Paul.
It sounds like you didn't code the function you sent me, but that it was "built-in". When I choose any of the dozen or so built-in functions the code is always the same def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): return '' Does your installation of Calibre actually display appropriate function code when you choose different built-in functions? If so, any ideas why? |
![]() |
![]() |
#28 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,087
Karma: 447222
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
Yes, it was one of the built in ones that Calibre supplies
Quote:
Don't know why. If I look at the 'code' for the function, I see the attached |
|
![]() |
![]() |
![]() |
#29 |
Nameless Being
|
Built-in regex-functions code missing
From recent replies, it seems that clicking on create/edit regex-function should reveal the code for built-in functions. My installation (Calibre 4.19 64-bit on Microsoft Windows [Version 10.0.18363.900]) does not.
Any ideas why? Have I turned something off inadvertently? Have I failed to install some module? Should I still be using OS/2? Having some functions (more than what's in the manual) to play with will help me learn enough to fix my epub book library. Any help you can give me (samples of code and search strings) will be greatly appreciated. |
![]() |
![]() |
#30 | ||||
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
Quote:
Quote:
![]() Quote:
|
||||
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
regex-function convert roman numerals | weberr | Editor | 11 | 09-22-2021 05:15 PM |
A regex function to number a mathematical ebook | dmonasse | Editor | 3 | 12-23-2014 02:54 AM |
Regex Function - Split unknown word | Paulie_D | Editor | 19 | 12-07-2014 05:12 AM |
Regex for Title Case or Sentence case? | Turtle91 | Sigil | 3 | 01-19-2013 01:36 PM |
Dutch title case function | fvdham | Library Management | 8 | 10-11-2012 10:09 PM |