Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 04-27-2012, 08:58 AM   #1
PatNY
Zennist
PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.
 
PatNY's Avatar
 
Posts: 523
Karma: 44760996
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD
A regex question

I have hundreds of subtitles throughout a cookbook, and would like to know if there is a way to do a search and replace to give all these titles an h3 designation:

Quote:
<p class="calibre8" id="filepos170176"><span class="calibre6 bold">Spicy Escarole with Croutons and Eggs</span></p>
Of course, the filepos information changes in every one, as does the actual text between the tags.

I only have a rudimentary understanding of regex and just tried a few things, but nothing would work. Any ideas?
PatNY is offline   Reply With Quote
Old 04-27-2012, 09:38 AM   #2
Perkin
Fanatic
Perkin has memorized the entire works of Homer, Shakespeare, and Jane AustenPerkin has memorized the entire works of Homer, Shakespeare, and Jane AustenPerkin has memorized the entire works of Homer, Shakespeare, and Jane AustenPerkin has memorized the entire works of Homer, Shakespeare, and Jane AustenPerkin has memorized the entire works of Homer, Shakespeare, and Jane AustenPerkin has memorized the entire works of Homer, Shakespeare, and Jane AustenPerkin has memorized the entire works of Homer, Shakespeare, and Jane AustenPerkin has memorized the entire works of Homer, Shakespeare, and Jane AustenPerkin has memorized the entire works of Homer, Shakespeare, and Jane AustenPerkin has memorized the entire works of Homer, Shakespeare, and Jane AustenPerkin has memorized the entire works of Homer, Shakespeare, and Jane Austen
 
Perkin's Avatar
 
Posts: 557
Karma: 23783
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300
If all the subtitles are <p class="calibre8" id="filepos######"><span class="calibre6 bold">
With other paragraphs not using those.

If they are you should be good to use this for search (regex mode)

Code:
<p class="calibre8" id="filepos\d+"><span class="calibre6 bold">(.+?)</span></p>
Replace with this

Code:
<h3>\1</h3>
If not we'll need more details of what the defining part is that means that it's the subtitle and if any other paragraphs use that as well.
Perkin is offline   Reply With Quote
 
Enthusiast
Old 04-27-2012, 09:52 AM   #3
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 6,992
Karma: 24340264
Join Date: Jan 2010
Device: Kindle Fire HD, Kindle 2
Quote:
Originally Posted by Perkin View Post
If they are you should be good to use this for search (regex mode)

Code:
<p class="calibre8" id="filepos\d+"><span class="calibre6 bold">(.+?)</span></p>
Replace with this

Code:
<h3>\1</h3>
You'll want to be careful there are no nested spans tags inside the <span class="calibre6 bold"> element. Given how those sections are already being used as headers, that would probably be pretty rare, but it could result in a bit of mess if that regex came upon something like:
Code:
<p class="calibre8" id="filepos170176"><span class="calibre6 bold"><span class="italic">Spicy Escarole with Croutons and Eggs</span></span></p>
__________________
Politics: A strife of interests masquerading as a contest of principles. The conduct of public affairs for private advantage.
DiapDealer is online now   Reply With Quote
Old 04-27-2012, 10:27 AM   #4
Perkin
Fanatic
Perkin has memorized the entire works of Homer, Shakespeare, and Jane AustenPerkin has memorized the entire works of Homer, Shakespeare, and Jane AustenPerkin has memorized the entire works of Homer, Shakespeare, and Jane AustenPerkin has memorized the entire works of Homer, Shakespeare, and Jane AustenPerkin has memorized the entire works of Homer, Shakespeare, and Jane AustenPerkin has memorized the entire works of Homer, Shakespeare, and Jane AustenPerkin has memorized the entire works of Homer, Shakespeare, and Jane AustenPerkin has memorized the entire works of Homer, Shakespeare, and Jane AustenPerkin has memorized the entire works of Homer, Shakespeare, and Jane AustenPerkin has memorized the entire works of Homer, Shakespeare, and Jane AustenPerkin has memorized the entire works of Homer, Shakespeare, and Jane Austen
 
Perkin's Avatar
 
Posts: 557
Karma: 23783
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300
Shoud be fairly safe though as I was doing opening p+span, with matching closing span+p

Would result in italic span being inserted in the header tag, while not necessarily wanted, also wouldn't be illegal etc..
Perkin is offline   Reply With Quote
Old 04-27-2012, 02:43 PM   #5
PatNY
Zennist
PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.
 
PatNY's Avatar
 
Posts: 523
Karma: 44760996
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD
Perkin, thanks a million for your help. Your solution worked perfectly.

Kudos to all you regex experts. I've tried to wrap my head around this but I can never quite grasp it. Or I grasp the basic principles, then after a few months I've completely forgotten it as I use regex so infrequently!

Last edited by PatNY; 04-27-2012 at 03:24 PM.
PatNY is offline   Reply With Quote
Old 04-27-2012, 04:45 PM   #6
PatNY
Zennist
PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.
 
PatNY's Avatar
 
Posts: 523
Karma: 44760996
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD
While there are some regex experts here, two more questions:

1) Is it possible to change all-caps text in a toc.ncx file to initial caps? For example I would like to change: <text>ICE CREAM</text> to <text>Ice Cream</text>

2) Is it possible to have some of the items in the metadata TOC underlined or italicized? I get the feeling this is not possible but I am not sure.
PatNY is offline   Reply With Quote
Old 04-27-2012, 05:14 PM   #7
theducks
Staff to 4 Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 10,955
Karma: 2574555
Join Date: Aug 2009
Location: The (original) Silicon Valley, USA
Device: Galaxy Tab 2,Black Astak PEz, K4NT(now Wifes)
Quote:
Originally Posted by PatNY View Post
While there are some regex experts here, two more questions:

1) Is it possible to change all-caps text in a toc.ncx file to initial caps? For example I would like to change: <text>ICE CREAM</text> to <text>Ice Cream</text>

2) Is it possible to have some of the items in the metadata TOC underlined or italicized? I get the feeling this is not possible but I am not sure.
Not your answer, but is your TOC being generated by Sigil from Headers?
<h3>ICE CREAM</h3>

you could use:
<h3 title="Ice Cream">ICE CREAM</h3>
__________________
Using: Ubuntu(32 bit):Oneric,Precise and XPpro SP3, W7HP(64)- - Libre Office w/Writer2EPUB
theducks is online now   Reply With Quote
Old 04-27-2012, 05:45 PM   #8
PatNY
Zennist
PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.
 
PatNY's Avatar
 
Posts: 523
Karma: 44760996
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD
Quote:
Originally Posted by theducks View Post
Not your answer, but is your TOC being generated by Sigil from Headers?
<h3>ICE CREAM</h3>

you could use:
<h3 title="Ice Cream">ICE CREAM</h3>
Hi ducks. Yes, the TOC.ncx is being auto-generated by heads in the epub. I really want to keep the text in the html files the way it is. If the headline in the epub pages is all caps, I'd like to keep it that way. In the metadata TOC, however, the density of all caps entries can be hard to read. I would like to know if one can use regex on the TOC.ncx file to turn the all caps into initial caps words.
PatNY is offline   Reply With Quote
Old 04-27-2012, 05:56 PM   #9
theducks
Staff to 4 Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 10,955
Karma: 2574555
Join Date: Aug 2009
Location: The (original) Silicon Valley, USA
Device: Galaxy Tab 2,Black Astak PEz, K4NT(now Wifes)
Quote:
Originally Posted by PatNY View Post
Hi ducks. Yes, the TOC.ncx is being auto-generated by heads in the epub. I really want to keep the text in the html files the way it is. If the headline in the epub pages is all caps, I'd like to keep it that way. In the metadata TOC, however, the density of all caps entries can be hard to read. I would like to know if one can use regex on the TOC.ncx file to turn the all caps into initial caps words.

I don't know about the Regex, but if you look close at my example you will see that title= overrides what is between the H3 tags, but leaves what is on the Page alone.
<h3 title="Ice Cream with topping">ICE CREAM</h3>
in this version Sigil would create a TOC entry: Ice Cream with topping

But the page would show plain Vanilla
__________________
Using: Ubuntu(32 bit):Oneric,Precise and XPpro SP3, W7HP(64)- - Libre Office w/Writer2EPUB
theducks is online now   Reply With Quote
Old 04-27-2012, 06:01 PM   #10
theducks
Staff to 4 Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 10,955
Karma: 2574555
Join Date: Aug 2009
Location: The (original) Silicon Valley, USA
Device: Galaxy Tab 2,Black Astak PEz, K4NT(now Wifes)
@ PatNY
Take a look at this page for Ideas: http://vim.wikia.com/wiki/Changing_c...ar_expressions
__________________
Using: Ubuntu(32 bit):Oneric,Precise and XPpro SP3, W7HP(64)- - Libre Office w/Writer2EPUB
theducks is online now   Reply With Quote
Old 04-27-2012, 06:25 PM   #11
theducks
Staff to 4 Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 10,955
Karma: 2574555
Join Date: Aug 2009
Location: The (original) Silicon Valley, USA
Device: Galaxy Tab 2,Black Astak PEz, K4NT(now Wifes)
This works in the Body for 2 words

Code:
<h3>(([A-Z])(.+) ([A-Z])(.+))</h3>

<h3 title="\2\L\3\E \4\L\5\E">\2\3 \4\5 </h3>
The Result is: <h3 title="Ice Cream">ICE CREAM </h3>

__________________
Using: Ubuntu(32 bit):Oneric,Precise and XPpro SP3, W7HP(64)- - Libre Office w/Writer2EPUB
theducks is online now   Reply With Quote
Old 04-27-2012, 06:59 PM   #12
PatNY
Zennist
PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.
 
PatNY's Avatar
 
Posts: 523
Karma: 44760996
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD
Question

Quote:
Originally Posted by theducks View Post
This works in the Body for 2 words

Code:
<h3>(([A-Z])(.+) ([A-Z])(.+))</h3>

<h3 title="\2\L\3\E \4\L\5\E">\2\3 \4\5 </h3>
The Result is: <h3 title="Ice Cream">ICE CREAM </h3>

So you seem to have hit upon an answer, but how do I apply it to the TOC.ncx while I am editing it?

IOW, how do I turn "<text>ICE CREAM</text>" into "<text>Ice Cream</text>"
PatNY is offline   Reply With Quote
Old 04-27-2012, 07:12 PM   #13
PatNY
Zennist
PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.
 
PatNY's Avatar
 
Posts: 523
Karma: 44760996
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD
OK, ducks, I tried to adapt your formula to the specific issue I had by using this:

Find:
<text>(([A-Z])(.+) ([A-Z])(.+))</text>

Replace:
<text>\2\L\3\E \4\L\5\E</text>

And it's mostly working. However If I have more than two words in the title, then only the first and last words get the initial cap. The words in the middle are all lower case.

So, for example, your solution will result in:

<text>Ice cream Rocks</text> instead of <text>Ice Cream Rocks</text>

So, do you know how to get every word in the title to be initial caps?
PatNY is offline   Reply With Quote
Old 04-27-2012, 07:17 PM   #14
capidamonte
Not who you think I am...
capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!
 
capidamonte's Avatar
 
Posts: 332
Karma: 5337
Join Date: Jan 2010
Location: Honolulu
Device: Sony PRS-T1
He's telling you to change the code in the book body, in the HTML, not in the toc.ncx. Sigil will generate the toc.ncx entries for you, using the "title" attribute instead of the content between the header tags.
__________________
cap
notetab can fix it

It is difficult to get a man to understand something
when his job depends on not understanding it.

-- Upton Sinclair
capidamonte is offline   Reply With Quote
Old 04-27-2012, 08:14 PM   #15
PatNY
Zennist
PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.
 
PatNY's Avatar
 
Posts: 523
Karma: 44760996
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD
Quote:
Originally Posted by capidamonte View Post
He's telling you to change the code in the book body, in the HTML, not in the toc.ncx. Sigil will generate the toc.ncx entries for you, using the "title" attribute instead of the content between the header tags.
Yes, I know he's doing that. However, there are various reasons why I want to do it in the TOC.ncx.

First off, I only need to change the case of words in the TOC.ncx, so I don't see the need to alter other files. I want to make as few changes as possible in the html files. That way if anything goes wrong, it's easier to identify the problem and correct it if it's limited to just the one toc.ncx file rather than having to examine and sift through potentially hundreds of pages in html files.

Second, going forward, there could be literally dozens of permutations in the variety of the tags surrounding a title in the html files. That would mean the exact formula might have to change every time, depending on what the set of tags was. I would have a hard time adjusting that formula every time.

However, the "<text>WORDS GO HERE</text>" tag construction in the TOC.ncx file is the same all the time. So one formula should solve the problem, every time no matter what.

Ducks got me halfway there. I can change all-caps titles into initial-caps words in the TOC, but it doesn't work for all the words in a title -- just the first and last!
PatNY is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Quick Regex Question cptsmidge Sigil 6 03-06-2012 04:20 AM
Yet another regex question Jabby Sigil 8 01-30-2012 08:41 PM
Regex question and maybe some help crutledge Sigil 9 03-10-2011 04:37 PM
Regex Question Archon Conversion 11 02-05-2011 10:13 AM
Import files, regex question al35 Calibre 0 03-22-2010 12:33 PM


All times are GMT -4. The time now is 10:00 AM.


MobileRead.com is a privately owned, operated and funded community.