02-11-2023, 04:30 PM | #1 |
Guru
Posts: 734
Karma: 1077122
Join Date: Sep 2013
Device: Kobo Forma
|
Finding Series of Capitalized Words?
Well, I thought I remembered a thread here about finding sequences of all capitalized words. But, I can't find it. So, I'll just start one.
In the Calibre Editor, I'm trying to find and select those sequences of all capitalized words that publishers sometimes stick in the first line of a chapter (or perhaps after some kind of scene break). Here's my current best shot (the Case-Sensitive box has to be checked for this): Code:
([A-Z0-9]+(?:\s[A-Z0-9\.,…’“”!?—-]+)+\b)
Can anyone come up with a way to include first words with punctuation? This is better than a sharp stick in the eye (i.e., better than manually finding and retyping the start of every chapter). But, I'd like to work through this. |
02-11-2023, 05:24 PM | #2 |
Resident Curmudgeon
Posts: 74,045
Karma: 129333562
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Or just learn to deal with it as it's done a lot.
|
Advert | |
|
02-12-2023, 02:58 AM | #3 |
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
So, what is a word (for your purposes)? I think it's something like:
Code:
[A-Z0-9][A-Z0-9\.,…’“”!?—-]* Now you want a number of words separated by spaces, how about (untested): Code:
({word}\s+)\b Code:
([A-Z0-9][A-Z0-9\.,…’“”!?—-]*\s+)\b |
02-12-2023, 09:09 AM | #4 |
the rook, bossing Never.
Posts: 11,173
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
And there are edge cases like FBI, NATO etc.
I personally gave up and just find each chapter start and manually edit. I'll remove stuff for small caps and drop caps at the start of a chapter (or anywhere) automatically as that seems safe. I found in the Wordprocessor that applying Sentence Case to First Paragraphs doesn't work due to Proper names and similar. |
02-12-2023, 11:18 AM | #5 |
Guru
Posts: 734
Karma: 1077122
Join Date: Sep 2013
Device: Kobo Forma
|
I'll add in some of those punctuations and see if the false positives from them outweigh the false negatives of not having them.
I don't have a problem with it picking up acronyms and other capitalized words. I've got a search to go through the book looking for individual capitalized words so I can check if they should be smallcapped or otherwise formatted. This will just supplement that. And, as you found in your word processor, I'll still have to manually check every sequence of all-cap words. Calibre doesn't have a Regex -function to Sentence Case things, so I use its Lower Case and its Title Case functions and look for proper names/etc issues to correct. This will just make it easier to find and do the initial conversion. |
Advert | |
|
02-12-2023, 12:18 PM | #6 |
the rook, bossing Never.
Posts: 11,173
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Good Luck!
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Finding What Books are Missing from a series | Spuggyface | Reading and Management | 3 | 02-08-2017 08:06 PM |
Truncate series name using template -- Help!! (use first 3 words) | PERSISTENCE | Library Management | 0 | 01-14-2017 05:17 AM |
Need help finding a certain type of series - Help please! | damican | Reading Recommendations | 31 | 02-21-2015 08:02 PM |
TOC based on Capitalized Words | buckm56 | Conversion | 5 | 06-03-2011 11:16 PM |
Detect chapter headings with capitalized words | fiendmish | Calibre | 6 | 05-31-2010 10:45 AM |