Quote:
Originally Posted by deback
This is how I automate it:
Do a regex search for the following (be sure to change the mode to Regex in the dropdown box):
>(\d+) (You might have to add a space before the (\d+), depending on the original coding.)
This might find all the chapter numbers, when the word "chapter" is not included anywhere. Then you can do a find and replace to replace the class with the "chapter" class.
Example (after you've found the class that was used; there could be inconsistent classes used by the creator, which is common):
Find the following: <p class="calibre_3">(\d+)</p>
Replace it with this: <p class="chapter">\1</p>
-or, if you prefer, replace it with the following:
<p class="chapter">Chapter \1</p>
Then go into the ToC editor, click on Generate ToC from XPath. Set up a macro to insert the following on the top Level 1 ToC line (mine is ctrl-shift-T -- or you can type it or you can fill out the lines on the next screen after you click on the wand at the right):
//*[re:test(@class, "chapter", "i")]
Then the Toc Editor will create entries for each chapter.
Create a CSS class for "chapter" to look the way you want it to look.
Here's mine:
.chapter {
display: block;
font-size: 1.4em; (this could change depending on length of the chapter title)
font-weight: bold;
text-align: center;
margin-bottom: 2em;
margin-left: 0;
margin-right: 0;
margin-top: 3em;
}
Convert the file again to have all the chapters start on a new page. You don't have to do this manually. You don't even need the line page-break-before: always; in your "chapter" class, because Convert will do it automatically.
|
Excellent information, thank you!
For consistent chapter headings that are numeric, have an actual unique style, or even are Roman numerals...yes, I search much as you suggest. The ones I find most aggravating are these:
<p class="calibrex">end of a chapter text...</p>
<p class="calibrex">Mary Goes to Market</p>
<p class="calibrex">On Monday Marry walked to the village...</p>
Where the middle line is actually a chapter break - un-numbered, un-identified. Sometimes it's all caps and [A-Z]{3} or something will help, but these just take time.
On the other hand, spending time looking at the text will often find the odd goof, like a dozen random paragraphs in the middle of the book that are strangely styled, so it's not all bad!
Interesting about the "pagrbreak..." lines. I always take them out, since they cause a blank screen, sometimes two, on the Kindle when reading. But then, I always have each chapter start a new file, which gives a clean break with no blank page. But it sounds like a re-convert will do that file splitting automatically. But will it get rid of "orphan" files that are just pieces of chapters? (I'm a neatnik, I guess.)
Thanks for the XPath example. I've not yet explored this, and just looking at the pop-up help frankly scared me off. I'll try this on my next TOC fix-it job. It looks like the "i" picks up the \1 and increments? Is that right? Then I could take a book with 102 chapters in those **&@^%# Roman numerals and easily convert them to arabic numerals?