Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 01-16-2022, 06:30 PM   #1
Peter Blaise
Member
Peter Blaise began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Dec 2016
Device: mobi
How do you AUTO-CONVERT YELLING to Leading Caps?

I use Tools > Check spelling.

I spend 5-clicks on each YELLING word to convert it to Leading Caps:

1 - left-click on YELLING-WORD under the spellcheck [ Word ] list
2 - left-click on the appropriate matching replacement word under [ Change selected word to: ]
3 - right-click on the word as it appears under [ Change selected word to: ]
4 - right-click and select [ Change case > Title case ] from two pop-up/down menus
5 - left-click [ Change selected word to: ]

Repeat those 5 steps for every YELLING WORD in the ebook to convert them one at a time to Leading Caps.

This is arduous, time-consuming, and spellcheck quirks require careful attention - is there a better way?

Spellcheck quirks wise, for example,
- sometimes the appropriate replacement word is shown in italics,
- sometimes the list of possible replacement words is not alphabetical,
- sometimes the default suggested word is close but wrong,
- sometimes the default suggested replacement word is the same as the original word,
... so, a sub-question here might be:
... is there is any way to normalize spellcheck's quirks?

But I digress.

I can use Find, Mode: [ Regex-function ] search in two iterative steps for every YELLING-to-Leading-Caps occasion that I want to change, but that is not much better than the spellcheck scheme, time-wise.

For example:

Step 1:
Find, Mode: [ Regex-function ] search for
<(p)[^>]*>.+?</\1>
( or <([Hh][1-6])[^>]*>.+?</\1> for headings )
... click [ Find ] repeatedly to move through the ebook,
... when YELLING is found and highlighted, select [ Replace ] via
[ Lower-case text (ignore tags) ]

Step 2:
click in the editing white space above the current selection to unselect it,
Find, Mode: [ Regex-function ] search again for
<(p)[^>]*>.+?</\1>
( or <([Hh][1-6])[^>]*>.+?</\1> for headings )
... when the previous section is re-found and highlighted, select [ Replace ] via
[ Title-case text (ignore tags) ]

It would be nice to twin-sequence convert in one step, all lower case, then Leading Caps, but Leading Caps [ Title-case ] does not work on YELLING - so, maybe another sub-question:
... is there a way to fix [ Title-case ] so we can select our target case in one step?

But I digress.

- - - - -

So, any clues?

Does anyone have a routine, a script, a program/extension, an add-in scheme to AUTO-CONVERT YELLING to Leading Caps throughout an entire ebook during Calibre editing?

- - - - -

Background - for dyslexia-type readers, ALL CAPS interferes with word recognition, versus Leading Caps, smalls, descenders, ascenders, which are all better suited to support quick word shape recognition for the reader.

Plus, Internet-wise, we all know that YELLING is dis-inviting of the audience's attention.

- - - - -

Thanks for your consideration, for letting me explore this, and share, I look forward to insights.
.
Peter Blaise is offline   Reply With Quote
Old 01-16-2022, 09:48 PM   #2
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
I think a better search would be:

Code:
\b([[[:upper:]]-]{2,})\b
That finds words that are only made up of uppercase letters and a hyphen. The minimum length is two letters. I can't get one to match a sentence that only contains uppercase letters. So, dropping the hyphen might make sense an using:

Code:
\b([[:upper:]]{2,})\b
With both, you need the "Case sensitive" option to be turned on.

Then I simply used the "Title-case text" regex function to convert the matched words to title case. That seemed to work in one step for each word. I was able to step through the test book just pressing "Replace and find" with a couple of "Finds" for some acronyms.
davidfor is offline   Reply With Quote
Advert
Old 01-19-2022, 01:18 PM   #3
Peter Blaise
Member
Peter Blaise began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Dec 2016
Device: mobi
Thanks davidfor - that reduces or changes the clicks, though still not automatic

In response to davidfor who suggested:
[ Find: ]
\b([[[:upper:]]-]{2,})\b
... for 2+CHARACTER-CAPS-WITH-HYPHENS
\b([[:upper:]]{2,})\b
... for 2+CHARACTER-CAPS-WITHOUT-HYPHENS
... plus [✓] Case sensitive
... [ Mode: ] [ Regex-function > Title-case text (ignore tags ]
... then [ Find ] and [ Replace and Find ]
... the user making editing decisions on each found word depending on actual word contents.

Yes, thank you, that reduces 5-clicks per word in [ Check spelling ] or the resulting 5-or-so-clicks required by my alternative 2-step of ( 1 ) search and make lower, then ( 2 ) re-search and make Leading Caps, taking either routine down to 1 or two clicks per CAPS word or phrase, manually, throughout the whole ebook.

But, unlike fixing in [ Check spelling ], I must fix each occurrence of a YELLING word, or additionally make a separate series of [ Replace all ] for words that I know reoccur. such as replacing all JOHN with John.

And, unlike my 2-step, which finds entire phrases, this find one word at a time.

So, on the one hand, the \b([[:upper:]]{2,})\b suggestion works.

On the other hand, I still have to scan the entire ebook manually, make a decision on every single WORD, manually, so this is still not an automatic whole-ebook-conversion solution.

But a significant improvement nonetheless.

Thank you.

Does anyone else have any other routine that they use to change YELLING to Leading Caps or Sentence caps?

I presume a thoroughly well-developed multi-line iterative interdependent program would include lookups to leave words like US and NJ and FISA and the like - known abbreviations, and maybe have a user toggle for CAPS-WITH-HYPHEN to either Caps-with-hyphen or Caps-With-Hyphen, plus awareness of punctuation, such that titles of works and headings get Leading Caps, and first lines of paragraphs get Sentence caps.

I suppose I could use text-to-speech, then speech-to-text ... but I would lose all formatting.

Why do people STILL YELL?

Thanks everyone for exploring this and sharing.
.
Peter Blaise is offline   Reply With Quote
Old 01-19-2022, 04:26 PM   #4
lomkiri
Groupie
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 167
Karma: 1497966
Join Date: Jul 2021
Device: N/A
- Selecting whole sentences:
You could add space and comma to your search string:
Code:
\b(\p{Lu}[\p{Lu}\s,-]+)\b
(note: \p{Lu} has the same meaning than [[:upper:]], you may use one or the other)
In this case, words like JOHN or FIFA will be targeted and transformed.
If an acronym with dots (F.I.F.A.) is inside the sentence, the selection will stop when reaching it.

- Excluding from the transformation the words not recognized by the dictionary:
Use the search string David gave you:
\b([[:upper:]]{2,})\b
with this regex-function:
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):    
    word = match.group(1)
    if dictionaries.recognized(word):
        return word[0] + word[1:].lower()
    return word
This will transform only the recognized words. The last "return" leaves the non-recognized words as they are, it's up to you to do another treatment on them.

- You have another possibility, it's to write into a temp file all the capitalized words not dict-recognized, and decide what you want to do with them (you can do that in a regex-function ; you could store them in a python set, and write the set on the last passage of the function)

If you want a more refined treatment, you'll have to imagine how you can lead with the exceptions and translate that logic into your regex-function

Suggestion: you could also surround the whole capitalized sentence with the tag <small>SENTENCE</small>, it will be much less aggressive, small-caps are often used as an acceptable emphasis. You can do that modifying slightly the regex-function I wrote above.

Last edited by lomkiri; 01-19-2022 at 07:44 PM.
lomkiri is offline   Reply With Quote
Old 01-19-2022, 08:43 PM   #5
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by lomkiri View Post
- Selecting whole sentences:
You could add space and comma to your search string:
Code:
\b(\p{Lu}[\p{Lu}\s,-]+)\b
(note: \p{Lu} has the same meaning than [[:upper:]], you may use one or the other)
In this case, words like JOHN or FIFA will be targeted and transformed.
If an acronym with dots (F.I.F.A.) is inside the sentence, the selection will stop when reaching it.
I just tried this and it picked up "I" by itself. Which I suppose if you are using title case for the matched words works, but, not if you are just changing them to lower case. But, it does feel wrong as it basically gives a lot of false positives. I tried:

Code:
\b(\p{Lu}{2}[\p{Lu}\s,-]*)\b
That didn't pick up "I" by itself but it missed "I'M". (The book I tested on had a few "I'm GOING TO..." with the action dependent on exactly how angry they were were. I didn't notice it when I read it as it worked.)
Quote:
- Excluding from the transformation the words not recognized by the dictionary:
Use the search string David gave you:
\b([[:upper:]]{2,})\b
with this regex-function:
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):    
    word = match.group(1)
    if dictionaries.recognized(word):
        return word[0] + word[1:].lower()
    return word
This will transform only the recognized words. The last "return" leaves the non-recognized words as they are, it's up to you to do another treatment on them.

- You have another possibility, it's to write into a temp file all the capitalized words not dict-recognized, and decide what you want to do with them (you can do that in a regex-function ; you could store them in a python set, and write the set on the last passage of the function)

If you want a more refined treatment, you'll have to imagine how you can lead with the exceptions and translate that logic into your regex-function
The issue I had there was that "USA" is in the dictionary. And I added "FIFA" to the ignored words and that meant it was in the dictionary. And the book I tested on had "CPR", "SOS", "TV" and a few others.
Quote:
Suggestion: you could also surround the whole capitalized sentence with the tag <small>SENTENCE</small>, it will be much less aggressive, small-caps are often used as an acceptable emphasis. You can do that modifying slightly the regex-function I wrote above.
Or use a span with a transform to lower case or capitalize.


@Peter Blaise: As to automating this, I really don't think that is a good idea. There are far to many exceptions to the rule. Your best bet is not to do it from the spelling checker. Use the search, look at the words and then decide if you want to change it or skip to the next one.

And for the record, this is purely a technical exercise to me. I don't think changing a book in this way makes sense. If the author does this, it should be deliberate and for the emphasis. If they overdo it, there are usually other problems in the book and they are generally worse.
davidfor is offline   Reply With Quote
Advert
Old 01-19-2022, 09:32 PM   #6
Karellen
Wizard
Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.
 
Karellen's Avatar
 
Posts: 1,611
Karma: 9500498
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
bookmark
Karellen is offline   Reply With Quote
Old 01-20-2022, 09:54 AM   #7
lomkiri
Groupie
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 167
Karma: 1497966
Join Date: Jul 2021
Device: N/A
First of all, your message, David, remember me that I forgot to include quote and curved quote in the main group of the regex.

Quote:
Originally Posted by davidfor View Post
I just tried this and it picked up "I" by itself. Which I suppose if you are using title case for the matched words works, but, not if you are just changing them to lower case.
Yes, I was not worried with "I" because the OP said he wanted to capitalize each word (1st letter uppercase). So "I" wasn't a concern.

If we want to exclude I\s and I' from the capture group, it begins to be tricky for me using regex (I tried (*SKIP)(*F) and negative look-ahead, but with a relative success). I would rather do this inside the regex function, we may lower everything and then capitalize back all \bi[\s'’].

Anyway, I totally agree with you in that an automatic treatment will probably create lots of false negatives and false positives, and is not recommanded. A solution could be to list all caps words in a text file (using a regex function), and use this file, after cleaning, as an index to say to the regex-function which words are to process (or at the contrary, which are not, a choice to do in relation with the aspect of the list). But it would be lot of work for the expected result, I guess (I mean in cleaning, the 2 regex-functions are quite easy to do).

Quote:
Originally Posted by davidfor View Post
Or use a span with a transform to lower case or capitalize.
Yes, this is clever, since we don't touch the text itself, but its style, so it's easy to give an aspect or another. The problem in that approch still is "I", which must resist to lowering case, so the regex-function will have a hard job to put the tags at the right places.

Another problem is that many e-readers don't respect the directive "text-transform" or "font-variant", and it's the case with mine. I don't know why, since it would be very easy to implement this, but the fact is there, those directives are not bullet proof.

Last edited by lomkiri; 01-20-2022 at 10:03 AM.
lomkiri is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
auto convert fails Bladesss Conversion 1 11-11-2018 11:57 AM
Auto Convert Issue wolfskies Conversion 1 09-27-2017 09:53 PM
RegEx Question: H1 ALL CAPS to All Caps phossler Sigil 21 02-06-2014 02:44 PM
Auto-convert on send to device. eschwartz Conversion 3 11-14-2013 04:07 AM
If no auto-convert, then no book? petercreasey Calibre 5 07-08-2010 12:25 PM


All times are GMT -4. The time now is 07:44 PM.


MobileRead.com is a privately owned, operated and funded community.