Hi, I'm not a programmer (I only know basic HTML) and I've gone over the RegEx basic tutorial on Calibre's website - and it's helped me for years on editing garbage out of OCR documents, etc. However, a common need I have is to quickly add entries to a table of contents (especially because table of contents is a primary tool I use for many books on my e-reader). Often books will follow a format for their chapter/section headings, but they won't be tagged as headers. I can manually turn them into (for example) <h3> tags, and then go to the book editor tool and "generate ToC from all headings" or "from major headings". When a book only has a dozen or so ToC entries I want, then I just do that manually. However, I would love a more automatic way to do that with RegEx find and replace,
especially when a book has a hundred or more entries I want to put in the ToC (for example, a cookbook).
So let me give an example for what I'm currently working on. There's a cookbook that has each section as a ToC entry, but each section has within it dozens or even hundreds of recipes. I want each one to be a ToC sub-entry without having to do it all manually. I found the RegEx for finding each recipe header, but I don't know how to create the "replace" portion of the text that will use the text that's already there and just add an <h3> tag around it. Here's some example code of what I want turn into ToC entries - this is how two of the different examples are already coded in the book:
Code:
<p class="left-bullet">••••••••••••••••••••••••••••••••••••••</p>
<p class="left"><strong>Apple-Walnut Pancakes</strong></p>
<p class="left-bullet">••••••••••••••••••••••••••••••••••</p>
<p class="left"><strong>Whole Wheat Orange Bread</strong></p>
The regex I use to identify this string has so far worked for every entry I've tested it on and it hasn't been too greedy to find anything else. It appears to find literally every recipe both in this category and in the other categories. So, here's the simple modification (regex code) I've used to "find" that appears successful:
Code:
<p class="left-bullet">•••*</p>
<p class="left"><strong>.*</strong></p>
What I would like to see that code (and each example of it) be "replaced" with is something like this:
Code:
<p class="left-bullet">(however many bullets it found)</p>
<h3>(Whatever the "find" text was for the recipe title)</h3>
The thing is, I've never really learned how to create the regex code that auto-populates a section with something it found - I've only ever learned to make code that directly substitutes (such as changing bold to italics, or changing some OCR junk to a blank, or a commonly misspelled word to the word I want instead). But I'm sure you guys know how to get it to reference what it found in a common pattern (such as the "find" pattern I have listed above), maintain the
actual found text but just replace/add some of the tags around it. So how does that work? Seems like it should be simple, but I just don't know how. Seems like if I can get that done, then it'll be as simple as going to the ToC editor and having it "Generate ToC from major headings" and I'll be good to go! It'll help me with quite a few other books in the future too.
I appreciate any help that any of you can offer! Thanks a lot!