Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 01-19-2013, 06:44 PM   #1
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,071
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
RegEx question (again)

Cleaning a book that someone has converted through Calibre a number of times.

Mostly a lot of grunt work, but I've come across the following many times, not only in this one but many others. The chapter number is H1, so the TOC just consists of a string of numbers


So if I can save the F&R in Tools, I can use it whenever I need it

Looks like this ..

Code:
<body>
  <h1>TWO</h1>

  <p>TITLE OF CHAPTER</p>

After manually making changes (lots of trial and error), this seems to work best

Code:
<body>
  <h1>TWO<br />
  TITLE OF CHAPTER</h1>

or if I could be even more clever

Code:
<body>
  <h1>Two<br />
  Title Of Chapter</h1>

Paul
phossler is offline   Reply With Quote
Old 01-19-2013, 09:19 PM   #2
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
is there an actual question in there, somewhere ?
cybmole is offline   Reply With Quote
Old 01-19-2013, 09:35 PM   #3
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,689
Karma: 54369090
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
His TOC will look like mush
There needs to be a &nbsp; before the break to prevent:
ONETitle of Chapter
theducks is offline   Reply With Quote
Old 01-20-2013, 01:05 AM   #4
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Actually <h1>Two<br /> Title Of Chapter</h1> will likely work just fine.
DaleDe is offline   Reply With Quote
Old 01-20-2013, 03:40 AM   #5
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
The following quick and dirty regex will replace:

Code:
  <h1>TWO</h1>

  <p>TITLE OF CHAPTER</p>
with:

Code:
     <h1 title="TITLE OF CHAPTER">TWO<br />
TITLE OF CHAPTER</h1>
Find: <h1>(.*?)</h1>\s+<p>(.*?)</p>
Replace: <h1 title="\2">\1<br />\n\2</h1>

It's not perfect, but should save you some editing. (Sigil uses the title attribute for the toc.)
Doitsu is offline   Reply With Quote
Old 01-20-2013, 06:07 AM   #6
Sunlite
Addict
Sunlite ought to be getting tired of karma fortunes by now.Sunlite ought to be getting tired of karma fortunes by now.Sunlite ought to be getting tired of karma fortunes by now.Sunlite ought to be getting tired of karma fortunes by now.Sunlite ought to be getting tired of karma fortunes by now.Sunlite ought to be getting tired of karma fortunes by now.Sunlite ought to be getting tired of karma fortunes by now.Sunlite ought to be getting tired of karma fortunes by now.Sunlite ought to be getting tired of karma fortunes by now.Sunlite ought to be getting tired of karma fortunes by now.Sunlite ought to be getting tired of karma fortunes by now.
 
Sunlite's Avatar
 
Posts: 206
Karma: 547516
Join Date: Mar 2008
Location: Berlin, Germany
Device: KObo Clara, Kobo Aura, PRS-T1, PB602, CyBook Gen3
I have a solution that is nearly the same as Doitsu, but also starts with the upper case to lower case conversion.

search for:
Code:
<h1>(.)(.*?)</h1>\s+<p>(.)(.*?)</p>
replace with:
Code:
<h1>\1\L\2\E<br /> \3\L\4\E</h1>
This will change
Code:
  <h1>TWO</h1>

  <p>TITLE OF CHAPTER</p>
into
Code:
<h1>Two<br /> Title of chapter</h1>
I can't think of a way to get title case for the title of the chapter without a second regex. Maybe someone else can.
Sunlite is offline   Reply With Quote
Old 01-20-2013, 07:49 AM   #7
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,689
Karma: 54369090
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by DaleDe View Post
Actually <h1>Two<br /> Title Of Chapter</h1> will likely work just fine.
I don't use the Leading space solution on the title because it throws off the display when Left justified or (sometimes obvious) when centered
Code:
Two
 Title of Chapter
theducks is offline   Reply With Quote
Old 01-20-2013, 11:25 AM   #8
ElMiko
Addict
ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.
 
ElMiko's Avatar
 
Posts: 320
Karma: 56788
Join Date: Jun 2011
Device: Kindle
Just for the record:

Code:
<h1>TWO<br />
Chapter Title</h1>
won't result in "TWOChapter Title" as the TOC value. It will display as "TWO Chapter Title". There is no real reason to add either leading or trailing spaces. Of course, as has already been stated, you can specify the desired title value by using the title attribute. Alternately you could just edit it the values directly in the .ncx file after generating the TOC.

Last edited by ElMiko; 01-20-2013 at 12:17 PM.
ElMiko is offline   Reply With Quote
Old 01-20-2013, 12:47 PM   #9
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,689
Karma: 54369090
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by ElMiko View Post
Just for the record:

Code:
<h1>TWO<br />
Chapter Title</h1>
won't result in "TWOChapter Title" as the TOC value. It will display as "TWO Chapter Title". There is no real reason to add either leading or trailing spaces. Of course, as has already been stated, you can specify the desired title value by using the title attribute. Alternately you could just edit it the values directly in the .ncx file after generating the TOC.
It runs together for me if I don't include some sort of space
theducks is offline   Reply With Quote
Old 01-20-2013, 01:08 PM   #10
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,071
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
Thanks to everyone for taking time to offer comments and make suggestions

Based on what I learned, I did change what I thought I wanted to do

I know it's not perfect, but this is what I ended up with


Find:

Code:
<h1>(.)(.*?)</h1>\s+<p>(.)(.*?)</p>
Replace:

Code:
<h1 title="\1\2&nbsp;&ndash;&nbsp;\3\4">\1\2<br />\3\4</h1>
I don't quite grasp all the Find 'tokens' ( esp. that pesky ?) but I'm working through it

Thanks again

Paul
phossler is offline   Reply With Quote
Old 01-20-2013, 01:09 PM   #11
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,463
Karma: 192992430
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by theducks View Post
It runs together for me if I don't include some sort of space
Probably all depend on your Clean Source Settings. If it's turned completely off and your code starts life as one line:
Code:
<h2>Chapter One<br />The Title</h2>
Then the entry will be created in the NCX file (when clicking "Generate ToC") with no space...
Quote:
Chapter OneThe Title
If, however, you have at least Pretty Print turned on, then
Code:
<h2>Chapter One<br />The Title</h2>
becomes
Code:
<h2>Chapter One<br />
The Title</h2>
and the entry will be created in the NCX file with a space when clicking Generate ToC...
Quote:
Chapter One The Title
That's the way it works for me anyway.

Last edited by DiapDealer; 01-20-2013 at 01:22 PM.
DiapDealer is offline   Reply With Quote
Old 01-20-2013, 01:10 PM   #12
ElMiko
Addict
ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.ElMiko actually enjoys Vogon poetry.
 
ElMiko's Avatar
 
Posts: 320
Karma: 56788
Join Date: Jun 2011
Device: Kindle
EDIT:
Refer to DD's post

Last edited by ElMiko; 01-20-2013 at 01:14 PM.
ElMiko is offline   Reply With Quote
Old 01-20-2013, 02:37 PM   #13
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,689
Karma: 54369090
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by DiapDealer View Post
Probably all depend on your Clean Source Settings. If it's turned completely off and your code starts life as one line:
Code:
<h2>Chapter One<br />The Title</h2>
Then the entry will be created in the NCX file (when clicking "Generate ToC") with no space...


If, however, you have at least Pretty Print turned on, then
Code:
<h2>Chapter One<br />The Title</h2>
becomes
Code:
<h2>Chapter One<br />
The Title</h2>
and the entry will be created in the NCX file with a space when clicking Generate ToC...


That's the way it works for me anyway.
I have Pretty Print ON, so maybe I just started doing this with an older version and did not change (and no harm seems to be done, so I never noticed )
theducks is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
A regex question PatNY Sigil 30 06-03-2012 02:03 PM
Quick Regex Question cptsmidge Sigil 6 03-06-2012 04:20 AM
Yet another regex question Jabby Sigil 8 01-30-2012 08:41 PM
Regex question and maybe some help crutledge Sigil 9 03-10-2011 04:37 PM
Regex Question Archon Conversion 11 02-05-2011 10:13 AM


All times are GMT -4. The time now is 10:25 AM.


MobileRead.com is a privately owned, operated and funded community.