View Single Post
Old 02-14-2025, 04:33 PM   #22
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,884
Karma: 6120478
Join Date: Nov 2009
Device: many
Quote:
Originally Posted by nabsltd View Post
The only time I use Calibre's function S&R is for case conversion, since it's one thing that a plain regex just can't do. Since I'm a pretty heavy editor, and do a lot of complex search and replace, I suspect that is true of most everyone else, too.
This is not done using the Find button and then the replace button with the case change flags operating on capture groups, right? ie. these used in the replace field before and after capture groups \1 \2 etc.

Code:
    // Case changes can be:
    // \l Lower case next character.
    // \u Upper case next character.
    // \L Lower case until \E.
    // \U Upper case until \E.
    // \E End case modification.
    // * Note: case changes cannot stop within a segment. Meaning
    // a \L within a \U will be ignored and the \U will be honored until

Or by hitting the Case Change tool bar icon after hitting the Find Button?


This is done by using Replace All button, correct?

And exactly what case conversions do you need:

Title Case
Upper Case
Lower Case
Capitalize

Any others?

And how smart is the calibre titlecase conversion?

Titlecasing can be actually quite hard to get right depending on what styles are common in which languages. For example: In English should you capitalize little words such as by, of, on, in, etc when not found at the start of a line? Some say no others yes. And what about Roman numerals and abbreviations in a title as they must be specially handled (ie. there can by combinations of Roman Numerals that make words such as MIX. You can find whole language dependent python modules on github that can still make mistakes.

So really well done title casing truly needs to be done by using Find to detect a potential title and looking at each potential title text word by word and then fixing it by hand (IMHO) using your style conventions or that of the editor or publisher.

Update:

Even the most complete version of titlecase that I could find for python with a license we could use in Sigil:

https://github.com/ppannuto/python-t...se/__init__.py

Has issues with many things:

https://github.com/ppannuto/python-titlecase/issues/96


And this is all just issues in English. With other languages, the results would be much worse.

Last edited by KevinH; 02-14-2025 at 05:42 PM.
KevinH is offline   Reply With Quote