01-20-2011, 02:38 AM | #1 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
upper case to sentence case conversion
is there a way, with regex, to tidy books where the first few words of each new chapter are in upper case.
I think that looks naff on an ereader, and would look better in sentence case, but coding that seems challenging |
01-20-2011, 02:46 AM | #2 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
You should apply a smallcaps style or pseudo smallcaps {text-size:80%} to those capitalized words - the reason they're showing up like that is they were smallcaps in the original book, but the sizing clues are often lost during OCR conversion.
You can find them via regex and wrap them with span tags using search and replace a regex something like this would be a good place to start: Code:
([A-Z]{2,}) Code:
<span style="smlcps">\1</span> You could also do something like this: Code:
([A-Z])([A-Z]{2,}) Code:
\1<span style="lower">\2</span> Code:
.lower { text-transform:lowercase; } Last edited by ldolse; 01-20-2011 at 02:52 AM. |
01-20-2011, 03:00 AM | #3 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
hmm - but I want to restrict changes to those start of chapter instances, not de-capitalise words elsewhere in the text, and not have to do each chapter manually - could be 50-100 chapters!.
maybe I have to look also for the trailing h2 tag leading into the chapter ? Code:
<h2 id="heading_id_3"><a class="calibre7" id="_Toc61566414">3</a></h2> <p class="MsoNormal3">NAOMI PHELPS DID MOST OF THE TALKING WHILE FRANCES SAT THERE AND shivered. Our secretary got her hot coffee and an afghan. Her hands |
01-20-2011, 03:09 AM | #4 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
That regex I provided looks for more than 2 characters in a row - so only all caps words are affected, not individual capital letters. The second regex/replacement leaves the first letter alone. You can tweak the regexes to include spaces as well.
|
01-20-2011, 03:11 AM | #5 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
PS - thinking it through some more - there are probably too many special cases to solve, for this to be worthwhile.
e.g. a chapter beginning THEN JED SAID TO FRED I KNOW would need to become Then Jed said to Fred I know even a built-in sentence case converter like in MSword , which would handle the capitalised I, would stumble on the proper names |
01-20-2011, 03:18 AM | #6 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Which is another reason I recommended smallcaps. I'm sure if you look at the printed book you will find it's using smallcaps as well.
|
01-20-2011, 04:39 AM | #7 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
ok thanks - another gap in my typesetting knowledge revealed!- I googled small caps - will experiment a little with conversions. I'd want something that works for epub & mobi - calibre viewer & Kindle.
Last edited by cybmole; 01-20-2011 at 05:01 AM. |
01-20-2011, 05:44 AM | #8 |
Guru
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
I know this isn't what you'll want to hear, but I think it's easier to do the replace by hand, because of Names or special cases, where capitals in the sentence need to be preserved.
After doing quite a few regex replaces, I found quite often I had to go back and alter some of them back to keep a few capitals. |
01-20-2011, 06:03 AM | #9 | |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Quote:
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
PRS-950 CASE - Cyber Monday special (900 case that fits) | gardenstate | Sony Reader | 5 | 11-29-2010 01:54 PM |
Case-Logic prs-600 case on 650 ? Cheap cases ? | Fif23 | Sony Reader | 7 | 11-28-2010 05:28 PM |
I don't like the way calibre sticks with upper-case/capital | acolsandra | Calibre | 6 | 11-12-2010 11:17 AM |
Conversion to title case | Seanette | Calibre | 23 | 10-27-2010 01:01 PM |
Waterfield Slipcase, sleeve case, & Travel Case | albert1028 | Amazon Kindle | 5 | 03-27-2009 07:44 PM |