01-19-2013, 06:44 PM | #1 |
Wizard
Posts: 1,071
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
RegEx question (again)
Cleaning a book that someone has converted through Calibre a number of times.
Mostly a lot of grunt work, but I've come across the following many times, not only in this one but many others. The chapter number is H1, so the TOC just consists of a string of numbers So if I can save the F&R in Tools, I can use it whenever I need it Looks like this .. Code:
<body> <h1>TWO</h1> <p>TITLE OF CHAPTER</p> After manually making changes (lots of trial and error), this seems to work best Code:
<body> <h1>TWO<br /> TITLE OF CHAPTER</h1> or if I could be even more clever Code:
<body> <h1>Two<br /> Title Of Chapter</h1> Paul |
01-19-2013, 09:19 PM | #2 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
is there an actual question in there, somewhere ?
|
Advert | |
|
01-19-2013, 09:35 PM | #3 |
Well trained by Cats
Posts: 29,803
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
His TOC will look like mush
There needs to be a before the break to prevent: ONETitle of Chapter |
01-20-2013, 01:05 AM | #4 |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Actually <h1>Two<br /> Title Of Chapter</h1> will likely work just fine.
|
01-20-2013, 03:40 AM | #5 |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
The following quick and dirty regex will replace:
Code:
<h1>TWO</h1> <p>TITLE OF CHAPTER</p> Code:
<h1 title="TITLE OF CHAPTER">TWO<br /> TITLE OF CHAPTER</h1> Replace: <h1 title="\2">\1<br />\n\2</h1> It's not perfect, but should save you some editing. (Sigil uses the title attribute for the toc.) |
Advert | |
|
01-20-2013, 06:07 AM | #6 |
Addict
Posts: 206
Karma: 547516
Join Date: Mar 2008
Location: Berlin, Germany
Device: KObo Clara, Kobo Aura, PRS-T1, PB602, CyBook Gen3
|
I have a solution that is nearly the same as Doitsu, but also starts with the upper case to lower case conversion.
search for: Code:
<h1>(.)(.*?)</h1>\s+<p>(.)(.*?)</p> Code:
<h1>\1\L\2\E<br /> \3\L\4\E</h1> Code:
<h1>TWO</h1> <p>TITLE OF CHAPTER</p> Code:
<h1>Two<br /> Title of chapter</h1> |
01-20-2013, 07:49 AM | #7 |
Well trained by Cats
Posts: 29,803
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
|
01-20-2013, 11:25 AM | #8 |
Addict
Posts: 320
Karma: 56788
Join Date: Jun 2011
Device: Kindle
|
Just for the record:
Code:
<h1>TWO<br /> Chapter Title</h1> Last edited by ElMiko; 01-20-2013 at 12:17 PM. |
01-20-2013, 12:47 PM | #9 | |
Well trained by Cats
Posts: 29,803
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
|
|
01-20-2013, 01:08 PM | #10 |
Wizard
Posts: 1,071
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
Thanks to everyone for taking time to offer comments and make suggestions
Based on what I learned, I did change what I thought I wanted to do I know it's not perfect, but this is what I ended up with Find: Code:
<h1>(.)(.*?)</h1>\s+<p>(.)(.*?)</p> Code:
<h1 title="\1\2 – \3\4">\1\2<br />\3\4</h1> Thanks again Paul |
01-20-2013, 01:09 PM | #11 | ||
Grand Sorcerer
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Probably all depend on your Clean Source Settings. If it's turned completely off and your code starts life as one line:
Code:
<h2>Chapter One<br />The Title</h2> Quote:
Code:
<h2>Chapter One<br />The Title</h2> Code:
<h2>Chapter One<br /> The Title</h2> Quote:
Last edited by DiapDealer; 01-20-2013 at 01:22 PM. |
||
01-20-2013, 01:10 PM | #12 |
Addict
Posts: 320
Karma: 56788
Join Date: Jun 2011
Device: Kindle
|
EDIT:
Refer to DD's post Last edited by ElMiko; 01-20-2013 at 01:14 PM. |
01-20-2013, 02:37 PM | #13 | |
Well trained by Cats
Posts: 29,803
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
A regex question | PatNY | Sigil | 30 | 06-03-2012 02:03 PM |
Quick Regex Question | cptsmidge | Sigil | 6 | 03-06-2012 04:20 AM |
Yet another regex question | Jabby | Sigil | 8 | 01-30-2012 08:41 PM |
Regex question and maybe some help | crutledge | Sigil | 9 | 03-10-2011 04:37 PM |
Regex Question | Archon | Conversion | 11 | 02-05-2011 10:13 AM |