Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 04-27-2012, 08:17 PM   #16
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,782
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by PatNY View Post
OK, ducks, I tried to adapt your formula to the specific issue I had by using this:

Find:
<text>(([A-Z])(.+) ([A-Z])(.+))</text>

Replace:
<text>\2\L\3\E \4\L\5\E</text>

And it's mostly working. However If I have more than two words in the title, then only the first and last words get the initial cap. The words in the middle are all lower case.

So, for example, your solution will result in:

<text>Ice cream Rocks</text> instead of <text>Ice Cream Rocks</text>

So, do you know how to get every word in the title to be initial caps?
You have 3 words
so you need another set of patterns (BTW you can only have 9 capture replacements
Code:
(([A-Z])(.+) ([A-Z])(.+) ([A-Z])(.+))
Code:
<text>\2\L\3\E \4\L\5\E \6\L\7\E</text>
Note that the whole capture is represented by the outermost ( ) and is available as \0

Again: work on the pages and include the Title option in the chapter headings, then autogenerate the TOC
theducks is offline   Reply With Quote
Old 04-27-2012, 08:35 PM   #17
PatNY
Zennist
PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.
 
PatNY's Avatar
 
Posts: 1,022
Karma: 47809468
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD+ & HD
Quote:
Originally Posted by theducks View Post
(BTW you can only have 9 capture replacements
ducks, I'm not sure what you mean by "capture replacements." Do you mean it will only work for titles that are 9 words or less? Some of the titles have 5, 7 or more words in them. If the max is 9, that's fine. I can always fix by hand the few titles that over 9 words.



Quote:
Again: work on the pages and include the Title option in the chapter headings, then autogenerate the TOC
ducks, see my reply to capidamonte. I explain my reasons why I want to limit editing to the toc.ncx. Also, this is not even about chapter headings. It's for subtitles in a chapter and can number in the hundreds. If I limit the changes to the toc.ncx file, I can see exactly what the regex formula has done to the file. But if I do the regex on all the html files, there is no way to easily check to see if everything has been done right or to easily correct things if something goes wrong. Because the subtitles will be scattered all over the place as opposed to being all in one single file in a toc.ncx.
PatNY is offline   Reply With Quote
Advert
Old 04-27-2012, 08:59 PM   #18
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,782
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Each word is 2 captures
([A-Z]) is the first Letter
(.+) <a space here> is the rest of the word with the space being used as a word delimiter detection

so my system works for 4 words maximum and must be crafted for
each of 4 , 3, 2 and 1 words surrounded by a <text> tag

A REGEX guru might come up with a better pattern / replacement
theducks is offline   Reply With Quote
Old 04-28-2012, 12:16 AM   #19
Ahmad Samir
Zealot
Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!
 
Posts: 114
Karma: 5246
Join Date: Jul 2010
Device: none
Try doing it in two goes; open toc.ncx, and make sure the replacement setting is set to Current file.

Now first leave the first letter in in the first word capitalised and lower all the letters after it:
Find: <text>([A-Z])(.+)
Replace: <text>\1\L\2

Then capitalise the first letter of each word:
Find: <text>(.+) ([a-z])
Replace: <text>\1 \u\2

you'll need to click "Replace All" more than once (until it says "No replacements made"), the number will depend on the number of words in the longest title; not elegant, but seems to work.

I guess there'll be some corner cases, e.g. a capitalised "The" or "To", but you can replace those easily afterwards.

Note that if you re-generate the toc again in Sigil all those changes will be gone, so make sure you don't need to generate the toc again before doing that.
Ahmad Samir is offline   Reply With Quote
Old 04-28-2012, 05:33 AM   #20
Timur
Connoisseur
Timur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five words
 
Posts: 54
Karma: 37363
Join Date: Aug 2011
Location: Istanbul
Device: EBW1150, Nook STR
@PatNY: You can do what you want with

Search:
Code:
\b[A-Z]\K([^\s]+)(?=[^<>]*</text>)
Replace:
Code:
\L\1\E
If your text includes characters outside ASCII range(like È, Ñ, etc.) you can use the unicode-aware search pattern:

Search:
Code:
(*UCP)\b[[:upper:]]\K([^\s]+)(?=[^<>]*</text>)
Note: (*UCP) option flag is working in my Sigil, but at least one user reported in the past that it has caused Sigil to crash.
Timur is offline   Reply With Quote
Advert
Old 05-01-2012, 06:25 AM   #21
PatNY
Zennist
PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.
 
PatNY's Avatar
 
Posts: 1,022
Karma: 47809468
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD+ & HD
Thanks to all who replied in this thread. I used Timur's solution and it works like a charm. Having all caps for TOC entries when there are hundreds of entries is just too hard to read, and using title case instead makes it so much better on my eyes.

There is just one situation left unfixed which I think could be fixed. Whenever there are hyphenated words, the second word is not capped. For example, CHOCOLATE-GLAZED will become Chocolate-glazed instead of Chocolate-Glazed.

So is there any way to construct a separate regex expression that will make every first letter after a hyphen an uppercase letter?

Also, I noticed in the latest version of Sigil, there is no search option for whole words only? I may be mistaken but I thought it had that in the past. I noticed that when I tried to use a "Replace All" to change "nut" to "Nut" that I ended up with a lot of "PeaNut" and ButterNut" results!

I may be misremembering, but I could have sworn the search options are less than before? Or maybe I am missing the option to put the search window in an advanced mode?
PatNY is offline   Reply With Quote
Old 05-01-2012, 11:44 AM   #22
Timur
Connoisseur
Timur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five words
 
Posts: 54
Karma: 37363
Join Date: Aug 2011
Location: Istanbul
Device: EBW1150, Nook STR
Changing this character class in the search pattern

Code:
[^\s]
to

Code:
[^\s-]
capitalizes the first letter after a hyphen too.

To make a "whole word" match enclose your search pattern between word boundary anchors \b, like this:

Code:
search: \bnut\b
replace: Nut
Timur is offline   Reply With Quote
Old 05-03-2012, 10:45 AM   #23
PatNY
Zennist
PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.
 
PatNY's Avatar
 
Posts: 1,022
Karma: 47809468
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD+ & HD
Hello Timur, thanks very much for your last post. Both your fixes worked perfectly.
PatNY is offline   Reply With Quote
Old 06-01-2012, 12:58 PM   #24
PatNY
Zennist
PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.
 
PatNY's Avatar
 
Posts: 1,022
Karma: 47809468
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD+ & HD
\b[A-Z]\K([^\s-]+)(?=[^<>]*</text>)

The above formula continues to work perfectly in Sigil. However, when I try to use it in EditPad Pro, I get the following error message:

"Unknown regex token \K. Use backslashes to escape metacharacters."

In addition, the "\K" part of the formula is highlighted in red, so that's where the problem is.

Does anyone know how to fix this so the formula works in this editing program too?
PatNY is offline   Reply With Quote
Old 06-01-2012, 01:05 PM   #25
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,515
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Unfortunately, RegEx is not a fixed language, and every program that supports it uses a different dialect. \K may be recognized by Sigil, but not by EditPad Pro. You'd have to read EditPad's documentation and see if there's anything equivalent to Sigil's \K.
Jellby is offline   Reply With Quote
Old 06-01-2012, 02:13 PM   #26
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by PatNY View Post
\b[A-Z]\K([^\s-]+)(?=[^<>]*</text>)

The above formula continues to work perfectly in Sigil. However, when I try to use it in EditPad Pro, I get the following error message:

"Unknown regex token \K. Use backslashes to escape metacharacters."

In addition, the "\K" part of the formula is highlighted in red, so that's where the problem is.

Does anyone know how to fix this so the formula works in this editing program too?
The \K just resets the match so that the characters preceding it are not included in the returned match. Since in your regex, the preceding characters are a fixed length (a requirement of PCRE for each alternative in a lookbehind), I'd suggest changing that first part to a lookbehind assertion... which will achieve the same thing. I have no experience with EditPad Pro, but I assume the lookaround syntax is the same for the JGSoft regex engine as it is for Sigil's PCRE engine. Something like:

Code:
(?<=\b[A-Z])([^\s-]+)(?=[^<>]*</text>)
I'd also suggest changing the lookbehind portion to (?<=\b\p{Lu}) so that capital unicode characters (like É or Ä) won't be excluded from the search. Making the entire new regex:

Code:
(?<=\b\p{Lu})([^\s-]+)(?=[^<>]*</text>)
I believe that should accomplish the same thing as your original expression and has (I hope) the added benefit of working with both the JGSoft and PCRE regex engines.

EDIT: Any Replace expression you were using should remain the same. The lookbehind and lookahead assertions are not capture groups like they might appear. You still only have one capture group in your entire expression.

Last edited by DiapDealer; 06-01-2012 at 02:41 PM.
DiapDealer is offline   Reply With Quote
Old 06-02-2012, 02:28 PM   #27
PatNY
Zennist
PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.
 
PatNY's Avatar
 
Posts: 1,022
Karma: 47809468
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD+ & HD
DiapDealer, thanks for your help. Your formula works well in Sigil, but not in EditPad Pro. It won't throw up an error message like before, but it doesn't give intended results. Instead, using the same replace string that Timur had provided, I will get this:

Code:
 <navPoint class="chapter" id="navpoint-2" playOrder="1">
      <navLabel>
        <t\Lext>TITLE\E P\LAGE\E</text>
      </navLabel>
      <content src="OEBPS/002-titlepage.html"/>
    </navPoint>
    <navPoint class="chapter" id="navpoint-3" playOrder="2">
      <navLabel>
        <t\Lext>DEDICATION\E</text>
      </navLabel>
      <content src="OEBPS/003-dedicationpage.html"/>
    </navPoint>
instead of this which Sigil will result in:

Code:
 <navPoint class="chapter" id="navpoint-2" playOrder="1">
      <navLabel>
        <text>Title Page</text>
      </navLabel>
      <content src="OEBPS/002-titlepage.html"/>
    </navPoint>
    <navPoint class="chapter" id="navpoint-3" playOrder="2">
      <navLabel>
        <text>Dedication</text>
      </navLabel>
      <content src="OEBPS/003-dedicationpage.html"/>
    </navPoint>
EditPad Pro seems to be using some finicky kind of Regex. I had no idea there were various forms of regex.

If someone can figure out a fix that works in EditPad Pro, great. Otherwise, I'll just continue to use Sigil to get title case in the metadata TOCs.
PatNY is offline   Reply With Quote
Old 06-02-2012, 03:59 PM   #28
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Actually, it looks like the "Find" is working as expected in your EditPad Pro example. It's the replace expression that's failing you there. It appears that JGSoft's regex engine doesn't like the \L\l \U\u \E switches that PCRE's engine uses for its case transformations.

If your replace expression is \L\1\E for Sigil, try using a replace expression of \L1 for EditPad Pro. Looks like EditPad Pro uses \Un \Ln and \In instead. Where n is your backreference number, U is for uppercase, L is for lowercase and I is for Initial capital-the rest lowercase.

So to recap (I recommend copy/paste—there are no spaces in ANY of the following expressions)...
Find (for Sigil or EditPad Pro):
Code:
(?<=\b\p{Lu})([^\s-]+)(?=[^<>]*</text>)
Replace (Sigil):
Code:
\L\1\E
Replace (EditPad Pro):
Code:
\L1
I verified it works with EditPad Lite which is supposed to use the same regex engine as its big brother, EditPad Pro.

Last edited by DiapDealer; 06-02-2012 at 04:08 PM.
DiapDealer is offline   Reply With Quote
Old 06-02-2012, 09:04 PM   #29
PatNY
Zennist
PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.
 
PatNY's Avatar
 
Posts: 1,022
Karma: 47809468
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD+ & HD
DiapDealer, you are a regex genius. I tested your fix in EditPad Pro and it works fine now. As someone who has tried to comprehend regex on multiple occasions, I marvel at your ability to not only understand it but to understand and work with multiple versions of it.

Why would there be so many flavors of regex, and is one method inherently better or more versatile or powerful than the other?

I will continue to use sigil for any extensive TOC reconstruction, creation or transformations as it has a more suitable interface for those jobs than a basic text editor. But sometimes all I need to do is a simple change to title case from all caps, and it's much more efficient to do it using EditPad Pro via calibre's tweak epub feature. So, many thanks for your help!

PatNY is offline   Reply With Quote
Old 06-03-2012, 08:19 AM   #30
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Cool. Glad it worked for you.

I thank you for the compliment, but I only consider my regex-fu to be passingly fair... there are some regex gods out there. I am an egg. But helping others helps me improve my skills. I actually find rexexp construction addictive in a slightly weird way.

As for "why so many?" and which is "better," I don't really have an answer. Most of their difference are fairly minor and the average user might never come across them. I prefer PCRE (Sigil), but mostly because of its widespread use (PHP, Apache, etc) and open source origins. PCRE might have a slight edge in capabilities over JGSoft in that PCRE can do recursion. And that "\K" you were initially using is a biggie as well. It allows you to get around the "fixed-length-only lookbehind hurdle."

Keep plugging away at it. A little bit will stick each time!
DiapDealer is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Quick Regex Question cptsmidge Sigil 6 03-06-2012 04:20 AM
Yet another regex question Jabby Sigil 8 01-30-2012 08:41 PM
Regex question and maybe some help crutledge Sigil 9 03-10-2011 04:37 PM
Regex Question Archon Conversion 11 02-05-2011 10:13 AM
Import files, regex question al35 Calibre 0 03-22-2010 12:33 PM


All times are GMT -4. The time now is 02:17 PM.


MobileRead.com is a privately owned, operated and funded community.