04-27-2012, 08:17 PM | #16 | |
Well trained by Cats
Posts: 29,792
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
so you need another set of patterns (BTW you can only have 9 capture replacements Code:
(([A-Z])(.+) ([A-Z])(.+) ([A-Z])(.+)) Code:
<text>\2\L\3\E \4\L\5\E \6\L\7\E</text> Again: work on the pages and include the Title option in the chapter headings, then autogenerate the TOC |
|
04-27-2012, 08:35 PM | #17 | |
Zennist
Posts: 1,022
Karma: 47809468
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD+ & HD
|
ducks, I'm not sure what you mean by "capture replacements." Do you mean it will only work for titles that are 9 words or less? Some of the titles have 5, 7 or more words in them. If the max is 9, that's fine. I can always fix by hand the few titles that over 9 words.
Quote:
|
|
Advert | |
|
04-27-2012, 08:59 PM | #18 |
Well trained by Cats
Posts: 29,792
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Each word is 2 captures
([A-Z]) is the first Letter (.+) <a space here> is the rest of the word with the space being used as a word delimiter detection so my system works for 4 words maximum and must be crafted for each of 4 , 3, 2 and 1 words surrounded by a <text> tag A REGEX guru might come up with a better pattern / replacement |
04-28-2012, 12:16 AM | #19 |
Zealot
Posts: 114
Karma: 5246
Join Date: Jul 2010
Device: none
|
Try doing it in two goes; open toc.ncx, and make sure the replacement setting is set to Current file.
Now first leave the first letter in in the first word capitalised and lower all the letters after it: Find: <text>([A-Z])(.+) Replace: <text>\1\L\2 Then capitalise the first letter of each word: Find: <text>(.+) ([a-z]) Replace: <text>\1 \u\2 you'll need to click "Replace All" more than once (until it says "No replacements made"), the number will depend on the number of words in the longest title; not elegant, but seems to work. I guess there'll be some corner cases, e.g. a capitalised "The" or "To", but you can replace those easily afterwards. Note that if you re-generate the toc again in Sigil all those changes will be gone, so make sure you don't need to generate the toc again before doing that. |
04-28-2012, 05:33 AM | #20 |
Connoisseur
Posts: 54
Karma: 37363
Join Date: Aug 2011
Location: Istanbul
Device: EBW1150, Nook STR
|
@PatNY: You can do what you want with
Search: Code:
\b[A-Z]\K([^\s]+)(?=[^<>]*</text>) Code:
\L\1\E Search: Code:
(*UCP)\b[[:upper:]]\K([^\s]+)(?=[^<>]*</text>) |
Advert | |
|
05-01-2012, 06:25 AM | #21 |
Zennist
Posts: 1,022
Karma: 47809468
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD+ & HD
|
Thanks to all who replied in this thread. I used Timur's solution and it works like a charm. Having all caps for TOC entries when there are hundreds of entries is just too hard to read, and using title case instead makes it so much better on my eyes.
There is just one situation left unfixed which I think could be fixed. Whenever there are hyphenated words, the second word is not capped. For example, CHOCOLATE-GLAZED will become Chocolate-glazed instead of Chocolate-Glazed. So is there any way to construct a separate regex expression that will make every first letter after a hyphen an uppercase letter? Also, I noticed in the latest version of Sigil, there is no search option for whole words only? I may be mistaken but I thought it had that in the past. I noticed that when I tried to use a "Replace All" to change "nut" to "Nut" that I ended up with a lot of "PeaNut" and ButterNut" results! I may be misremembering, but I could have sworn the search options are less than before? Or maybe I am missing the option to put the search window in an advanced mode? |
05-01-2012, 11:44 AM | #22 |
Connoisseur
Posts: 54
Karma: 37363
Join Date: Aug 2011
Location: Istanbul
Device: EBW1150, Nook STR
|
Changing this character class in the search pattern
Code:
[^\s] Code:
[^\s-] To make a "whole word" match enclose your search pattern between word boundary anchors \b, like this: Code:
search: \bnut\b replace: Nut |
05-03-2012, 10:45 AM | #23 |
Zennist
Posts: 1,022
Karma: 47809468
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD+ & HD
|
Hello Timur, thanks very much for your last post. Both your fixes worked perfectly.
|
06-01-2012, 12:58 PM | #24 |
Zennist
Posts: 1,022
Karma: 47809468
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD+ & HD
|
\b[A-Z]\K([^\s-]+)(?=[^<>]*</text>)
The above formula continues to work perfectly in Sigil. However, when I try to use it in EditPad Pro, I get the following error message: "Unknown regex token \K. Use backslashes to escape metacharacters." In addition, the "\K" part of the formula is highlighted in red, so that's where the problem is. Does anyone know how to fix this so the formula works in this editing program too? |
06-01-2012, 01:05 PM | #25 |
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Unfortunately, RegEx is not a fixed language, and every program that supports it uses a different dialect. \K may be recognized by Sigil, but not by EditPad Pro. You'd have to read EditPad's documentation and see if there's anything equivalent to Sigil's \K.
|
06-01-2012, 02:13 PM | #26 | |
Grand Sorcerer
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
Code:
(?<=\b[A-Z])([^\s-]+)(?=[^<>]*</text>) Code:
(?<=\b\p{Lu})([^\s-]+)(?=[^<>]*</text>) EDIT: Any Replace expression you were using should remain the same. The lookbehind and lookahead assertions are not capture groups like they might appear. You still only have one capture group in your entire expression. Last edited by DiapDealer; 06-01-2012 at 02:41 PM. |
|
06-02-2012, 02:28 PM | #27 |
Zennist
Posts: 1,022
Karma: 47809468
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD+ & HD
|
DiapDealer, thanks for your help. Your formula works well in Sigil, but not in EditPad Pro. It won't throw up an error message like before, but it doesn't give intended results. Instead, using the same replace string that Timur had provided, I will get this:
Code:
<navPoint class="chapter" id="navpoint-2" playOrder="1"> <navLabel> <t\Lext>TITLE\E P\LAGE\E</text> </navLabel> <content src="OEBPS/002-titlepage.html"/> </navPoint> <navPoint class="chapter" id="navpoint-3" playOrder="2"> <navLabel> <t\Lext>DEDICATION\E</text> </navLabel> <content src="OEBPS/003-dedicationpage.html"/> </navPoint> Code:
<navPoint class="chapter" id="navpoint-2" playOrder="1"> <navLabel> <text>Title Page</text> </navLabel> <content src="OEBPS/002-titlepage.html"/> </navPoint> <navPoint class="chapter" id="navpoint-3" playOrder="2"> <navLabel> <text>Dedication</text> </navLabel> <content src="OEBPS/003-dedicationpage.html"/> </navPoint> If someone can figure out a fix that works in EditPad Pro, great. Otherwise, I'll just continue to use Sigil to get title case in the metadata TOCs. |
06-02-2012, 03:59 PM | #28 |
Grand Sorcerer
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Actually, it looks like the "Find" is working as expected in your EditPad Pro example. It's the replace expression that's failing you there. It appears that JGSoft's regex engine doesn't like the \L\l \U\u \E switches that PCRE's engine uses for its case transformations.
If your replace expression is \L\1\E for Sigil, try using a replace expression of \L1 for EditPad Pro. Looks like EditPad Pro uses \Un \Ln and \In instead. Where n is your backreference number, U is for uppercase, L is for lowercase and I is for Initial capital-the rest lowercase. So to recap (I recommend copy/paste—there are no spaces in ANY of the following expressions)... Find (for Sigil or EditPad Pro): Code:
(?<=\b\p{Lu})([^\s-]+)(?=[^<>]*</text>) Code:
\L\1\E Code:
\L1 Last edited by DiapDealer; 06-02-2012 at 04:08 PM. |
06-02-2012, 09:04 PM | #29 |
Zennist
Posts: 1,022
Karma: 47809468
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD+ & HD
|
DiapDealer, you are a regex genius. I tested your fix in EditPad Pro and it works fine now. As someone who has tried to comprehend regex on multiple occasions, I marvel at your ability to not only understand it but to understand and work with multiple versions of it.
Why would there be so many flavors of regex, and is one method inherently better or more versatile or powerful than the other? I will continue to use sigil for any extensive TOC reconstruction, creation or transformations as it has a more suitable interface for those jobs than a basic text editor. But sometimes all I need to do is a simple change to title case from all caps, and it's much more efficient to do it using EditPad Pro via calibre's tweak epub feature. So, many thanks for your help! |
06-03-2012, 08:19 AM | #30 |
Grand Sorcerer
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Cool. Glad it worked for you.
I thank you for the compliment, but I only consider my regex-fu to be passingly fair... there are some regex gods out there. I am an egg. But helping others helps me improve my skills. I actually find rexexp construction addictive in a slightly weird way. As for "why so many?" and which is "better," I don't really have an answer. Most of their difference are fairly minor and the average user might never come across them. I prefer PCRE (Sigil), but mostly because of its widespread use (PHP, Apache, etc) and open source origins. PCRE might have a slight edge in capabilities over JGSoft in that PCRE can do recursion. And that "\K" you were initially using is a biggie as well. It allows you to get around the "fixed-length-only lookbehind hurdle." Keep plugging away at it. A little bit will stick each time! |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Quick Regex Question | cptsmidge | Sigil | 6 | 03-06-2012 04:20 AM |
Yet another regex question | Jabby | Sigil | 8 | 01-30-2012 08:41 PM |
Regex question and maybe some help | crutledge | Sigil | 9 | 03-10-2011 04:37 PM |
Regex Question | Archon | Conversion | 11 | 02-05-2011 10:13 AM |
Import files, regex question | al35 | Calibre | 0 | 03-22-2010 12:33 PM |