02-22-2009, 12:50 AM | #1 |
hopeless n00b
Posts: 5,110
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
|
html2epub TOC and chapter detection help
I have some questions regarding html2epub's chapter detection and TOC generation.
I'm testing an HTML file with a level depth of 2. Code:
<html>
<head>
<title>Omnibus Collection</title>
</head>
<body>
<div class='header1'>Omnibus Collection</div>
<hr />
<div class='booklist'>
<a href='#book1'>Book One</a><br />
<a href='#book2'>Book Two</a><br />
</div>
<div class='book'>
<a name='book1' />
<div class='booktitle'>Book One</div><hr />
<div class='introduction'>
<p>The first book in the series.</p>
</div>
<div class='chapter'>
<div class='chaptertitle'>1. Chapter One</div>
<div class='chaptercontent'>
<p>This is a truly fascinating chapter.</p>
</div>
</div>
<div class='chapter'>
<div class='chaptertitle'>2. Chapter Two</div>
<div class='chaptercontent'>
<p>A worthy continuation of a fine tradition.</p>
</div>
</div>
</div>
<div class='book'>
<a name='book2' />
<div class='booktitle'>Book Two</div><hr />
<div class='introduction'>
<p>The second book in the series.</p>
</div>
<div class='chapter'>
<div class='chaptertitle'>1. Chapter One</div>
<div class='chaptercontent'>
<p>This is a truly fascinating chapter.</p>
</div>
</div>
<div class='chapter'>
<div class='chaptertitle'>2. Chapter Two</div>
<div class='chaptercontent'>
<p>A worthy continuation of a fine tradition.</p>
</div>
</div>
</div>
</body>
</html>
Level 2 TOC: //*[@class = 'chaptertitle'] The generated TOC looks like: Book One 1. Chapter One Book Two2. Chapter Two 1. Chapter One which is the desired outcome. My problem is it doesn't insert a pagebreak or rule before the book entry. It does, however, insert both before the chapter entry. Help please?2. Chapter Two Also, what's the command-line syntax for the above? I'm just using the GUI for testing right now but will be using the command-line utility for an automated script once I get the chapter detection working the way I want. Thanks! |
02-22-2009, 03:09 AM | #2 |
Enthusiast
Posts: 43
Karma: 24
Join Date: Feb 2009
Location: Australia
Device: Sony 505
|
My reply is a set of questions on the same topic - i.e., of chapter detection.
I'm unable to get started with chapter detection because I'm unsure how I can read the relevant code. For example, whilst the regular expression tutorial (Xpath) is clear and straightforward, I'm unable to determine whether the code referred to is from the input file (eg., an html file generated from, say, Open Office) or is xhtml code within Calibre. The only code I seem to have access to is that of the (html) input file whcih I generated from Open Office. The Table of Contents generated from Calibre invariably lists the endnote reference numbers with the links intact as the Table of Contents and yet the regular expression in the Xpath box is that which searches for a heading with the string 'Chapter'; the endnotes are correctly generated as endnotes with the correct hyper links and that's wonderful. But they're muscling in on the Table of Contents' territory, and that's a shame. I'd welcome any help. Calibre looks like it's going to be worth the effort to come to terms with but I'm on the verge of infinite loop hysteria and would appreciate being set straight. |
Advert | |
|
02-22-2009, 03:52 AM | #3 |
hopeless n00b
Posts: 5,110
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
|
It's based on the input file, that much I can tell. By default, any <h1> or <h2> tag that contains any of the words 'chapter', 'section', 'book' or 'part', or any tag whose class is 'chapter' is recognized as a chapter.
|
02-22-2009, 09:35 AM | #4 |
creator of calibre
Posts: 44,380
Karma: 23764838
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@ilovejedd
On the command line the options are --level1-toc and --level2-toc and --level3-toc You can use --help to see all the options As for page breaks, page breaks are only inseted automatically before chapters, not toc items. If you want page breaks use override css with Code:
.booktitle { page-break-before: always } .chaptertitle { page-break-before: always } The XPath expressions refer to the source html file |
02-22-2009, 05:24 PM | #5 |
hopeless n00b
Posts: 5,110
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
|
Hmm... So basically, any of the toc options would add a link to the table of contents but these wouldn't be recognized as a chapter unless it's added specifically to the XPath for chapter detection?
I'm using Windows. Would the following command-line do the trick? Code:
html2epub input.html --level1-toc //*[@class='booktitle'] --level2-toc //*[@class='chaptertitle'] --chapter //*[@class='chapter']|//*[@class='book'] --chapter-mark both
|
Advert | |
|
02-22-2009, 05:31 PM | #6 |
creator of calibre
Posts: 44,380
Karma: 23764838
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
yes and remember to encolse command line arguments with special characters in quotes
|
02-22-2009, 05:58 PM | #7 |
hopeless n00b
Posts: 5,110
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
|
Okay, thanks a bunch!
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Help with Chapter detection | ubergeeksov | Calibre | 0 | 09-02-2010 04:56 AM |
html2epub: chapter splitting on more than 1 heading level | Portnull | Calibre | 1 | 06-25-2009 09:17 AM |
chapter detection in any book | yuki86 | Calibre | 9 | 05-06-2009 06:54 AM |
Chapter detection for LRF | HenryP | Calibre | 12 | 04-03-2009 08:22 AM |
Calibre chapter detection | AKninja04 | Calibre | 5 | 09-14-2008 12:09 PM |