Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 02-22-2009, 12:50 AM   #1
ilovejedd
hopeless n00b
ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.
 
ilovejedd's Avatar
 
Posts: 5,111
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
html2epub TOC and chapter detection help

I have some questions regarding html2epub's chapter detection and TOC generation.

I'm testing an HTML file with a level depth of 2.
Code:
<html>
<head>
<title>Omnibus Collection</title>
</head>
<body>

	<div class='header1'>Omnibus Collection</div>
	<hr />

	<div class='booklist'>
		<a href='#book1'>Book One</a><br />
		<a href='#book2'>Book Two</a><br />
	</div>

	<div class='book'>
		<a name='book1' />
		<div class='booktitle'>Book One</div><hr />
		<div class='introduction'>
			<p>The first book in the series.</p>
		</div>
		<div class='chapter'>
			<div class='chaptertitle'>1. Chapter One</div>
			<div class='chaptercontent'>
			        <p>This is a truly fascinating chapter.</p>
			</div>
		</div>
		<div class='chapter'>
			<div class='chaptertitle'>2. Chapter Two</div>
			<div class='chaptercontent'>
			        <p>A worthy continuation of a fine tradition.</p>
			</div>
		</div>
	</div>

	<div class='book'>
		<a name='book2' />
		<div class='booktitle'>Book Two</div><hr />
		<div class='introduction'>
			<p>The second book in the series.</p>
		</div>
		<div class='chapter'>
			<div class='chaptertitle'>1. Chapter One</div>
			<div class='chaptercontent'>
			        <p>This is a truly fascinating chapter.</p>
			</div>
		</div>
		<div class='chapter'>
			<div class='chaptertitle'>2. Chapter Two</div>
			<div class='chaptercontent'>
			        <p>A worthy continuation of a fine tradition.</p>
			</div>
		</div>
	</div>

</body>
</html>
Level 1 TOC: //*[@class = 'booktitle']
Level 2 TOC: //*[@class = 'chaptertitle']

The generated TOC looks like:

Book One
1. Chapter One
2. Chapter Two
Book Two
1. Chapter One
2. Chapter Two
which is the desired outcome. My problem is it doesn't insert a pagebreak or rule before the book entry. It does, however, insert both before the chapter entry. Help please?

Also, what's the command-line syntax for the above? I'm just using the GUI for testing right now but will be using the command-line utility for an automated script once I get the chapter detection working the way I want.

Thanks!
ilovejedd is offline   Reply With Quote
Old 02-22-2009, 03:09 AM   #2
HenryP
Enthusiast
HenryP began at the beginning.
 
HenryP's Avatar
 
Posts: 43
Karma: 24
Join Date: Feb 2009
Location: Australia
Device: Sony 505
My reply is a set of questions on the same topic - i.e., of chapter detection.
I'm unable to get started with chapter detection because I'm unsure how I can read the relevant code.
For example, whilst the regular expression tutorial (Xpath) is clear and straightforward, I'm unable to determine whether the code referred to is from the input file (eg., an html file generated from, say, Open Office) or is xhtml code within Calibre.
The only code I seem to have access to is that of the (html) input file whcih I generated from Open Office.
The Table of Contents generated from Calibre invariably lists the endnote reference numbers with the links intact as the Table of Contents and yet the regular expression in the Xpath box is that which searches for a heading with the string 'Chapter'; the endnotes are correctly generated as endnotes with the correct hyper links and that's wonderful. But they're muscling in on the Table of Contents' territory, and that's a shame.
I'd welcome any help. Calibre looks like it's going to be worth the effort to come to terms with but I'm on the verge of infinite loop hysteria and would appreciate being set straight.
HenryP is offline   Reply With Quote
Advert
Old 02-22-2009, 03:52 AM   #3
ilovejedd
hopeless n00b
ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.
 
ilovejedd's Avatar
 
Posts: 5,111
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
It's based on the input file, that much I can tell. By default, any <h1> or <h2> tag that contains any of the words 'chapter', 'section', 'book' or 'part', or any tag whose class is 'chapter' is recognized as a chapter.
ilovejedd is offline   Reply With Quote
Old 02-22-2009, 09:35 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@ilovejedd
On the command line the options are --level1-toc and --level2-toc and --level3-toc

You can use --help to see all the options

As for page breaks, page breaks are only inseted automatically before chapters, not toc items. If you want page breaks use override css with

Code:
.booktitle { page-break-before: always }
.chaptertitle { page-break-before: always }
@HenryP
The XPath expressions refer to the source html file
kovidgoyal is offline   Reply With Quote
Old 02-22-2009, 05:24 PM   #5
ilovejedd
hopeless n00b
ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.
 
ilovejedd's Avatar
 
Posts: 5,111
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
Hmm... So basically, any of the toc options would add a link to the table of contents but these wouldn't be recognized as a chapter unless it's added specifically to the XPath for chapter detection?

I'm using Windows. Would the following command-line do the trick?
Code:
html2epub input.html --level1-toc //*[@class='booktitle'] --level2-toc //*[@class='chaptertitle'] --chapter //*[@class='chapter']|//*[@class='book'] --chapter-mark both
Thanks!
ilovejedd is offline   Reply With Quote
Advert
Old 02-22-2009, 05:31 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
yes and remember to encolse command line arguments with special characters in quotes
kovidgoyal is offline   Reply With Quote
Old 02-22-2009, 05:58 PM   #7
ilovejedd
hopeless n00b
ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.
 
ilovejedd's Avatar
 
Posts: 5,111
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
Okay, thanks a bunch!
ilovejedd is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with Chapter detection ubergeeksov Calibre 0 09-02-2010 04:56 AM
html2epub: chapter splitting on more than 1 heading level Portnull Calibre 1 06-25-2009 09:17 AM
chapter detection in any book yuki86 Calibre 9 05-06-2009 06:54 AM
Chapter detection for LRF HenryP Calibre 12 04-03-2009 08:22 AM
Calibre chapter detection AKninja04 Calibre 5 09-14-2008 12:09 PM


All times are GMT -4. The time now is 12:33 AM.


MobileRead.com is a privately owned, operated and funded community.