Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 01-19-2013, 07:44 PM   #1
phossler
Addict
phossler can understand the language of future parallel dimensionsphossler can understand the language of future parallel dimensionsphossler can understand the language of future parallel dimensionsphossler can understand the language of future parallel dimensionsphossler can understand the language of future parallel dimensionsphossler can understand the language of future parallel dimensionsphossler can understand the language of future parallel dimensionsphossler can understand the language of future parallel dimensionsphossler can understand the language of future parallel dimensionsphossler can understand the language of future parallel dimensionsphossler can understand the language of future parallel dimensions
 
Posts: 350
Karma: 51406
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: kindle
RegEx question (again)

Cleaning a book that someone has converted through Calibre a number of times.

Mostly a lot of grunt work, but I've come across the following many times, not only in this one but many others. The chapter number is H1, so the TOC just consists of a string of numbers


So if I can save the F&R in Tools, I can use it whenever I need it

Looks like this ..

Code:
<body>
  <h1>TWO</h1>

  <p>TITLE OF CHAPTER</p>

After manually making changes (lots of trial and error), this seems to work best

Code:
<body>
  <h1>TWO<br />
  TITLE OF CHAPTER</h1>

or if I could be even more clever

Code:
<body>
  <h1>Two<br />
  Title Of Chapter</h1>

Paul
phossler is offline   Reply With Quote
Old 01-19-2013, 10:19 PM   #2
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,962
Karma: 1280000
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Kindle Fire HDX 8.9 , Kindle for PC
is there an actual question in there, somewhere ?
cybmole is offline   Reply With Quote
 
Advertisement
Old 01-19-2013, 10:35 PM   #3
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,053
Karma: 5936659
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
His TOC will look like mush
There needs to be a &nbsp; before the break to prevent:
ONETitle of Chapter
theducks is online now   Reply With Quote
Old 01-20-2013, 02:05 AM   #4
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 9,725
Karma: 5072190
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2
Actually <h1>Two<br /> Title Of Chapter</h1> will likely work just fine.
DaleDe is offline   Reply With Quote
Old 01-20-2013, 04:40 AM   #5
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 2,027
Karma: 4836606
Join Date: Dec 2010
Device: Kindle PW2
The following quick and dirty regex will replace:

Code:
  <h1>TWO</h1>

  <p>TITLE OF CHAPTER</p>
with:

Code:
     <h1 title="TITLE OF CHAPTER">TWO<br />
TITLE OF CHAPTER</h1>
Find: <h1>(.*?)</h1>\s+<p>(.*?)</p>
Replace: <h1 title="\2">\1<br />\n\2</h1>

It's not perfect, but should save you some editing. (Sigil uses the title attribute for the toc.)
Doitsu is offline   Reply With Quote
Old 01-20-2013, 07:07 AM   #6
Sunlite
Zealot
Sunlite can program the VCR without an owner's manual.Sunlite can program the VCR without an owner's manual.Sunlite can program the VCR without an owner's manual.Sunlite can program the VCR without an owner's manual.Sunlite can program the VCR without an owner's manual.Sunlite can program the VCR without an owner's manual.Sunlite can program the VCR without an owner's manual.Sunlite can program the VCR without an owner's manual.Sunlite can program the VCR without an owner's manual.Sunlite can program the VCR without an owner's manual.Sunlite can program the VCR without an owner's manual.
 
Sunlite's Avatar
 
Posts: 125
Karma: 165452
Join Date: Mar 2008
Location: Berlin, Germany
Device: Kobo Aura, PRS-T1, PB602, CyBook Gen3
I have a solution that is nearly the same as Doitsu, but also starts with the upper case to lower case conversion.

search for:
Code:
<h1>(.)(.*?)</h1>\s+<p>(.)(.*?)</p>
replace with:
Code:
<h1>\1\L\2\E<br /> \3\L\4\E</h1>
This will change
Code:
  <h1>TWO</h1>

  <p>TITLE OF CHAPTER</p>
into
Code:
<h1>Two<br /> Title of chapter</h1>
I can't think of a way to get title case for the title of the chapter without a second regex. Maybe someone else can.
Sunlite is offline   Reply With Quote
Old 01-20-2013, 08:49 AM   #7
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,053
Karma: 5936659
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by DaleDe View Post
Actually <h1>Two<br /> Title Of Chapter</h1> will likely work just fine.
I don't use the Leading space solution on the title because it throws off the display when Left justified or (sometimes obvious) when centered
Code:
Two
 Title of Chapter
theducks is online now   Reply With Quote
Old 01-20-2013, 12:25 PM   #8
ElMiko
Addict
ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.
 
ElMiko's Avatar
 
Posts: 299
Karma: 56788
Join Date: Jun 2011
Device: Kindle
Just for the record:

Code:
<h1>TWO<br />
Chapter Title</h1>
won't result in "TWOChapter Title" as the TOC value. It will display as "TWO Chapter Title". There is no real reason to add either leading or trailing spaces. Of course, as has already been stated, you can specify the desired title value by using the title attribute. Alternately you could just edit it the values directly in the .ncx file after generating the TOC.

Last edited by ElMiko; 01-20-2013 at 01:17 PM.
ElMiko is offline   Reply With Quote
Old 01-20-2013, 01:47 PM   #9
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,053
Karma: 5936659
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by ElMiko View Post
Just for the record:

Code:
<h1>TWO<br />
Chapter Title</h1>
won't result in "TWOChapter Title" as the TOC value. It will display as "TWO Chapter Title". There is no real reason to add either leading or trailing spaces. Of course, as has already been stated, you can specify the desired title value by using the title attribute. Alternately you could just edit it the values directly in the .ncx file after generating the TOC.
It runs together for me if I don't include some sort of space
theducks is online now   Reply With Quote
Old 01-20-2013, 02:08 PM   #10
phossler
Addict
phossler can understand the language of future parallel dimensionsphossler can understand the language of future parallel dimensionsphossler can understand the language of future parallel dimensionsphossler can understand the language of future parallel dimensionsphossler can understand the language of future parallel dimensionsphossler can understand the language of future parallel dimensionsphossler can understand the language of future parallel dimensionsphossler can understand the language of future parallel dimensionsphossler can understand the language of future parallel dimensionsphossler can understand the language of future parallel dimensionsphossler can understand the language of future parallel dimensions
 
Posts: 350
Karma: 51406
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: kindle
Thanks to everyone for taking time to offer comments and make suggestions

Based on what I learned, I did change what I thought I wanted to do

I know it's not perfect, but this is what I ended up with


Find:

Code:
<h1>(.)(.*?)</h1>\s+<p>(.)(.*?)</p>
Replace:

Code:
<h1 title="\1\2&nbsp;&ndash;&nbsp;\3\4">\1\2<br />\3\4</h1>
I don't quite grasp all the Find 'tokens' ( esp. that pesky ?) but I'm working through it

Thanks again

Paul
phossler is offline   Reply With Quote
Old 01-20-2013, 02:09 PM   #11
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 9,406
Karma: 43171350
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by theducks View Post
It runs together for me if I don't include some sort of space
Probably all depend on your Clean Source Settings. If it's turned completely off and your code starts life as one line:
Code:
<h2>Chapter One<br />The Title</h2>
Then the entry will be created in the NCX file (when clicking "Generate ToC") with no space...
Quote:
Chapter OneThe Title
If, however, you have at least Pretty Print turned on, then
Code:
<h2>Chapter One<br />The Title</h2>
becomes
Code:
<h2>Chapter One<br />
The Title</h2>
and the entry will be created in the NCX file with a space when clicking Generate ToC...
Quote:
Chapter One The Title
That's the way it works for me anyway.

Last edited by DiapDealer; 01-20-2013 at 02:22 PM.
DiapDealer is online now   Reply With Quote
Old 01-20-2013, 02:10 PM   #12
ElMiko
Addict
ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.
 
ElMiko's Avatar
 
Posts: 299
Karma: 56788
Join Date: Jun 2011
Device: Kindle
EDIT:
Refer to DD's post

Last edited by ElMiko; 01-20-2013 at 02:14 PM.
ElMiko is offline   Reply With Quote
Old 01-20-2013, 03:37 PM   #13
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,053
Karma: 5936659
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by DiapDealer View Post
Probably all depend on your Clean Source Settings. If it's turned completely off and your code starts life as one line:
Code:
<h2>Chapter One<br />The Title</h2>
Then the entry will be created in the NCX file (when clicking "Generate ToC") with no space...


If, however, you have at least Pretty Print turned on, then
Code:
<h2>Chapter One<br />The Title</h2>
becomes
Code:
<h2>Chapter One<br />
The Title</h2>
and the entry will be created in the NCX file with a space when clicking Generate ToC...


That's the way it works for me anyway.
I have Pretty Print ON, so maybe I just started doing this with an older version and did not change (and no harm seems to be done, so I never noticed )
theducks is online now   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
A regex question PatNY Sigil 30 06-03-2012 03:03 PM
Quick Regex Question cptsmidge Sigil 6 03-06-2012 05:20 AM
Yet another regex question Jabby Sigil 8 01-30-2012 09:41 PM
Regex question and maybe some help crutledge Sigil 9 03-10-2011 05:37 PM
Regex Question Archon Conversion 11 02-05-2011 11:13 AM


All times are GMT -4. The time now is 07:54 PM.


MobileRead.com is a privately owned, operated and funded community.