Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 08-12-2018, 04:35 AM   #1
madagascaradam
Enthusiast
madagascaradam began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Jul 2015
Device: Kobo H2O Edition 2
Need simple RegEx function help for adding tags to be found in ToC

Hi, I'm not a programmer (I only know basic HTML) and I've gone over the RegEx basic tutorial on Calibre's website - and it's helped me for years on editing garbage out of OCR documents, etc. However, a common need I have is to quickly add entries to a table of contents (especially because table of contents is a primary tool I use for many books on my e-reader). Often books will follow a format for their chapter/section headings, but they won't be tagged as headers. I can manually turn them into (for example) <h3> tags, and then go to the book editor tool and "generate ToC from all headings" or "from major headings". When a book only has a dozen or so ToC entries I want, then I just do that manually. However, I would love a more automatic way to do that with RegEx find and replace, especially when a book has a hundred or more entries I want to put in the ToC (for example, a cookbook).

So let me give an example for what I'm currently working on. There's a cookbook that has each section as a ToC entry, but each section has within it dozens or even hundreds of recipes. I want each one to be a ToC sub-entry without having to do it all manually. I found the RegEx for finding each recipe header, but I don't know how to create the "replace" portion of the text that will use the text that's already there and just add an <h3> tag around it. Here's some example code of what I want turn into ToC entries - this is how two of the different examples are already coded in the book:

Code:
<p class="left-bullet">••••••••••••••••••••••••••••••••••••••</p>
<p class="left"><strong>Apple-Walnut Pancakes</strong></p>

<p class="left-bullet">••••••••••••••••••••••••••••••••••</p>
<p class="left"><strong>Whole Wheat Orange Bread</strong></p>
The regex I use to identify this string has so far worked for every entry I've tested it on and it hasn't been too greedy to find anything else. It appears to find literally every recipe both in this category and in the other categories. So, here's the simple modification (regex code) I've used to "find" that appears successful:

Code:
<p class="left-bullet">•••*</p>
<p class="left"><strong>.*</strong></p>
What I would like to see that code (and each example of it) be "replaced" with is something like this:

Code:
<p class="left-bullet">(however many bullets it found)</p>
<h3>(Whatever the "find" text was for the recipe title)</h3>
The thing is, I've never really learned how to create the regex code that auto-populates a section with something it found - I've only ever learned to make code that directly substitutes (such as changing bold to italics, or changing some OCR junk to a blank, or a commonly misspelled word to the word I want instead). But I'm sure you guys know how to get it to reference what it found in a common pattern (such as the "find" pattern I have listed above), maintain the actual found text but just replace/add some of the tags around it. So how does that work? Seems like it should be simple, but I just don't know how. Seems like if I can get that done, then it'll be as simple as going to the ToC editor and having it "Generate ToC from major headings" and I'll be good to go! It'll help me with quite a few other books in the future too.

I appreciate any help that any of you can offer! Thanks a lot!
madagascaradam is offline   Reply With Quote
Old 08-12-2018, 06:49 AM   #2
madagascaradam
Enthusiast
madagascaradam began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Jul 2015
Device: Kobo H2O Edition 2
OK, I did more searching in the forum and read some examples that happened to show what I needed. Putting parentheses around my expressions essentially numbers what's found within them for later reference (such as in the "replace" section):

Code:
<p class="left-bullet">(•••*)</p>
<p class="left"><strong>(.*)</strong></p>
Then, if I make my new "replace" code this, it does what I want:

Code:
<p class="left-bullet">\1</p>
<h3>\2</h3>
Then I go in and add those entries as auto-generated from "major headers" in the edit ToC section.

Now that I understand that, I went back into the RegEx FAQ and the Quick reference page to see if it was explained there... and now that I understand what it means, I can see it's referred to there. However, it certainly wasn't clear and it would be clarified a lot if an example (such as mine) were given. Actually, the real example that helped me figure it out was taken from here: https://www.mobileread.com/forums/sh...=replace+regex

The person's question and the other's response helped me see what was different about it, experiment a little, and figure it out. Still, I think more "grouping" explanation would help a LOT for those new to RegEx (like me) who are reading the FAQ. They should put an extra example in there, and talk about how surrounding your expression with parentheses makes a group of it and can be recalled later by choosing it's number in the order in which it was grouped (and give examples so they can see what you're talking about). Anyway, that's my suggestion. Just glad I found my answer because now I have several hundred entries quickly added to my ToC without manually doing it all!
madagascaradam is offline   Reply With Quote
Old 08-12-2018, 10:29 AM   #3
gbm
Wizard
gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.
 
Posts: 2,082
Karma: 8796704
Join Date: Jun 2010
Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite
Quote:
Originally Posted by madagascaradam View Post
OK, I did more searching in the forum and read some examples that happened to show what I needed. Putting parentheses around my expressions essentially numbers what's found within them for later reference (such as in the "replace" section):

Code:
<p class="left-bullet">(•••*)</p>
<p class="left"><strong>(.*)</strong></p>
Then, if I make my new "replace" code this, it does what I want:

Code:
<p class="left-bullet">\1</p>
<h3>\2</h3>
Then I go in and add those entries as auto-generated from "major headers" in the edit ToC section.

Now that I understand that, I went back into the RegEx FAQ and the Quick reference page to see if it was explained there... and now that I understand what it means, I can see it's referred to there. However, it certainly wasn't clear and it would be clarified a lot if an example (such as mine) were given. Actually, the real example that helped me figure it out was taken from here: https://www.mobileread.com/forums/sh...=replace+regex

The person's question and the other's response helped me see what was different about it, experiment a little, and figure it out. Still, I think more "grouping" explanation would help a LOT for those new to RegEx (like me) who are reading the FAQ. They should put an extra example in there, and talk about how surrounding your expression with parentheses makes a group of it and can be recalled later by choosing it's number in the order in which it was grouped (and give examples so they can see what you're talking about). Anyway, that's my suggestion. Just glad I found my answer because now I have several hundred entries quickly added to my ToC without manually doing it all!
I highly recommend you install Diaps Editing Toolbag plugin, will do a lot of what you want.

bernie
Attached Thumbnails
Click image for larger version

Name:	Screenshot from 2018-08-12 10-28-25.png
Views:	146
Size:	27.9 KB
ID:	165539  
gbm is offline   Reply With Quote
Old 08-13-2018, 10:15 AM   #4
madagascaradam
Enthusiast
madagascaradam began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Jul 2015
Device: Kobo H2O Edition 2
Quote:
Originally Posted by gbm View Post
I highly recommend you install Diaps Editing Toolbag plugin, will do a lot of what you want.

bernie
Thanks for the suggestion. I looked that up and installed it just now. I saw the tool... but honestly it looks like that tool might be more complicated for me now. Now that I've figured out how to "group" and refer to the groups in the replace box, that is. The tool seems like it would change too much of the book formatting, *unless* I specified it pretty closely. But now that I know how the regex grouping works, it seems like specifying pretty closely that way is easier/more clear to me than it would be to use those tools. Anyway, I do appreciate the suggestion though!
madagascaradam is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
RegEx Function: Title Case phossler Editor 29 07-04-2020 10:52 AM
Random number in Regex Function? nqk Editor 2 05-23-2017 11:47 PM
RegEx-Function and hyphenation problem scratch Editor 4 01-28-2017 12:44 PM
Regex Function about «» and “” senhal Editor 8 04-06-2016 02:12 AM
Use Regex to Code an Inline TOC, from an External TOC's .ncx File mostlynovels ePub 2 03-16-2011 12:15 PM


All times are GMT -4. The time now is 08:57 PM.


MobileRead.com is a privately owned, operated and funded community.