Cannot create table of content when converting my ebooks

ghostyjack · 07-01-2009, 07:22 AM

I hope someone can help me on this as it's quite frustrating.

I've started converting my books to a common format to make my life easier for reading them.

I choose epub as it looks like it should be a good format for reading and for archival purposes.

When I convert a book from LIT to epub, I do not get any TOC generated in the finished epub.

I've tried LIT's with and without a TOC included it it and have used the option to force Calibre to use it's oun but no joy.

Anyone got any ideas?

ldolse · 07-01-2009, 08:17 AM

Lit files generally don't have a concept of a TOC, so you have to edit the html output before doing a final conversion to epub.

I generally do this (with .6):

Disable Splitting in the conversion dialog - disable split on page breaks and set the max split size to two megabytes
Convert to epub
Unzip the epub
Edit the HTML file, insert line endings as Calibre defaults to no line feeds
Find all the chapters and surround them by H2 or H3 tags
Make any other adjustments merited by taste
Save the HTML
Zip the css file, HTML file, and any embedded images together
Go to the edit metadata dialog for the book, import the zip file as a new format
Open the convert dialog again
Use the zip as the source format
Edit the xpath to match the chapters in the book - change hx if needed to match what you used (h2, h3, h4, etc), edit the regex if needed to match the actual chapters.
set the epub conversion option to split on page breaks
Convert

user_none · 07-01-2009, 08:38 AM

In 0.6 you can specify the --use-auto-toc option. "Force use of auto-generated Table of Contents" in the GUI. If it is still having trouble finding the chapters you can specify the xpath for it to identify them. E.G. --level1-toc=//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|section|part\s+', 'i')) or @class = 'chapter']. "Detect Chapters" in the GUI.

ghostyjack · 07-01-2009, 08:50 AM

Thanks for the quick responses guys.

"Force use of auto-generated Table of Contents" is also available in the current non-beta version but as stated in the first post it didn't work.

I'll have a go when I get home from work but I'm not quite sure on this xpath stuff so might need a bit further explanation (or a slightly clearer example) on it.

user_none · 07-01-2009, 09:32 PM

Using the auto toc needs to be coupled with an xpath expression to specify what the toc items are. The default expression works in most cases but not all. Hence the need to create a custom expression for the document. In your case it isn't working. XPath is basically a way to specify elements in an XML document that meet certain criteria.

Have a look at this XPath tutorial. It is a bit complicated but it allows for precise matching. ldolse gave a good way to go about figuring out what the XPath needs to match. The only change I would make is to surround the chapters with h1 or h2 tags instead of h2 and h3 because h1 and h2 are detected by the default XPath that way you won't have to make your own.

ldolse · 07-01-2009, 11:42 PM

In my experience with Lit files the chapters are rarely surrounded by hx tags. I have seen the 'chapter' class, but it usually increments the class for each chapter - 'chapter1', 'chapter2', etc. Will the default xpath still match when the class name is changing like that?

Of course the above is probably more true of the bootleg lit files. Professionally produced lit files do seem to follow guidelines some stricter guidelines, but professionally produced ones don't seem that common.

Usually when I edit the document I the first thing I'll check is whether the chapter delimiters are something that matches the default chapter regex. Then I'll make sure those are surrounded by H1 or H2 tags. A lot of books don't use Chapter xx though, they just use a word or a phrase. Usually it's pretty simple to write a regex to find all of those breaks though, then surround them by the hx tags. Then I'll just change the xpath to something like this:
--level1-toc=//*[((name()='h1' or name()='h2') and re:test(., '.*', 'i')) or @class = 'chapter']

The key to using that simpler regex is to make sure that you're not overlapping with other uses of h1 or h2 tags, which are very common in the title page. So I'll go with h3 or h4 instead. Probably easier just to specify chapter classes now that I think about it though....

That whole process I described is why I'd love to be able to use the GUI to convert to uncompressed OEB to better facilitate edits like this. I know Calibre doesn't support the idea of a folder as a type of book, but one option would be to just not add the OEB format to the library, just dump it to the filesystem. Think of it as an 'export to OEB' option instead of a 'convert to OEB' option.

kovidgoyal · 07-03-2009, 10:42 AM

Use the --debug-input switch to ebook-convert to "dump to OEB" rather than converting to OEB

lightkeeper54 · 07-05-2009, 11:33 AM

Kovid,

I'm confused. Where do I specify a command-line switch on Windows without running a command prompt?

WayneD · 07-05-2009, 07:09 PM

Quote:

Originally Posted by lightkeeper54

I'm confused. Where do I specify a command-line switch on Windows without running a command prompt?

Kovid's suggestion was to use of the "ebook-convert" script, which is indeed run via the command-line.

ldolse · 07-05-2009, 08:56 PM

Yes, that does require the command line. Before the betas using a debug command wasn't neccessary though, I just needed to specify a .oeb extension with ebook-convert and I would get an uncompressed folder with OEB. Has that changed then?

My general problem with using the command line is I've adopted the GUI for my work, and having to navigate Calibre's folder structure using the command line makes it a bit difficult to switch between the two.

kovidgoyal · 07-05-2009, 09:28 PM

The difference between .oeb and --debug-input is that --debug-inpu aborts the conversion pipeline immediately after the input plugin is run, whereas .oeb outputs the end result of running the full conversion pipeline

07-01-2009, 07:22 AM	#1
ghostyjack Guru Posts: 718 Karma: 1085610 Join Date: Mar 2009 Location: Bristol, England Device: PRS-T1, 1825PT, Galaxy Tab, One X, TF700T, Aura HD, Nexus 7	Cannot create table of content when converting my ebooks I hope someone can help me on this as it's quite frustrating. I've started converting my books to a common format to make my life easier for reading them. I choose epub as it looks like it should be a good format for reading and for archival purposes. When I convert a book from LIT to epub, I do not get any TOC generated in the finished epub. I've tried LIT's with and without a TOC included it it and have used the option to force Calibre to use it's oun but no joy. Anyone got any ideas?

07-01-2009, 08:17 AM	#2
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	Lit files generally don't have a concept of a TOC, so you have to edit the html output before doing a final conversion to epub. I generally do this (with .6): Disable Splitting in the conversion dialog - disable split on page breaks and set the max split size to two megabytes Convert to epub Unzip the epub Edit the HTML file, insert line endings as Calibre defaults to no line feeds Find all the chapters and surround them by H2 or H3 tags Make any other adjustments merited by taste Save the HTML Zip the css file, HTML file, and any embedded images together Go to the edit metadata dialog for the book, import the zip file as a new format Open the convert dialog again Use the zip as the source format Edit the xpath to match the chapters in the book - change hx if needed to match what you used (h2, h3, h4, etc), edit the regex if needed to match the actual chapters. set the epub conversion option to split on page breaks Convert Last edited by ldolse; 07-01-2009 at 11:50 PM.

07-01-2009, 11:42 PM	#6
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	In my experience with Lit files the chapters are rarely surrounded by hx tags. I have seen the 'chapter' class, but it usually increments the class for each chapter - 'chapter1', 'chapter2', etc. Will the default xpath still match when the class name is changing like that? Of course the above is probably more true of the bootleg lit files. Professionally produced lit files do seem to follow guidelines some stricter guidelines, but professionally produced ones don't seem that common. Usually when I edit the document I the first thing I'll check is whether the chapter delimiters are something that matches the default chapter regex. Then I'll make sure those are surrounded by H1 or H2 tags. A lot of books don't use Chapter xx though, they just use a word or a phrase. Usually it's pretty simple to write a regex to find all of those breaks though, then surround them by the hx tags. Then I'll just change the xpath to something like this: --level1-toc=//[((name()='h1' or name()='h2') and re:test(., '.', 'i')) or @class = 'chapter'] The key to using that simpler regex is to make sure that you're not overlapping with other uses of h1 or h2 tags, which are very common in the title page. So I'll go with h3 or h4 instead. Probably easier just to specify chapter classes now that I think about it though.... That whole process I described is why I'd love to be able to use the GUI to convert to uncompressed OEB to better facilitate edits like this. I know Calibre doesn't support the idea of a folder as a type of book, but one option would be to just not add the OEB format to the library, just dump it to the filesystem. Think of it as an 'export to OEB' option instead of a 'convert to OEB' option. Last edited by ldolse; 07-01-2009 at 11:47 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
How to make a Table of Content for PDF	physics@war	PDF	2	06-19-2009 11:16 AM
Create a table of contents?	RobLikesBrunch	Amazon Kindle	13	03-09-2009 07:59 PM
Is there a way to create a table of contents for notes	timezone	iRex	0	08-03-2008 03:54 PM
Table Of Content	tomcool420	Sony Reader Dev Corner	3	03-16-2008 12:14 PM
Can I Create New Content?	BRubble	Sony Reader	3	02-20-2008 10:36 AM

07-01-2009, 08:38 AM	#3
user_none Sigil & calibre developer Posts: 2,488 Karma: 1063785 Join Date: Jan 2009 Location: Florida, USA Device: Nook STR	In 0.6 you can specify the --use-auto-toc option. "Force use of auto-generated Table of Contents" in the GUI. If it is still having trouble finding the chapters you can specify the xpath for it to identify them. E.G. --level1-toc=//*[((name()='h1' or name()='h2') and re:test(., 'chapter\|book\|section\|part\s+', 'i')) or @class = 'chapter']. "Detect Chapters" in the GUI.

07-01-2009, 08:50 AM	#4
ghostyjack Guru Posts: 718 Karma: 1085610 Join Date: Mar 2009 Location: Bristol, England Device: PRS-T1, 1825PT, Galaxy Tab, One X, TF700T, Aura HD, Nexus 7	Thanks for the quick responses guys. "Force use of auto-generated Table of Contents" is also available in the current non-beta version but as stated in the first post it didn't work. I'll have a go when I get home from work but I'm not quite sure on this xpath stuff so might need a bit further explanation (or a slightly clearer example) on it.

07-01-2009, 09:32 PM	#5
user_none Sigil & calibre developer Posts: 2,488 Karma: 1063785 Join Date: Jan 2009 Location: Florida, USA Device: Nook STR	Using the auto toc needs to be coupled with an xpath expression to specify what the toc items are. The default expression works in most cases but not all. Hence the need to create a custom expression for the document. In your case it isn't working. XPath is basically a way to specify elements in an XML document that meet certain criteria. Have a look at this XPath tutorial. It is a bit complicated but it allows for precise matching. ldolse gave a good way to go about figuring out what the XPath needs to match. The only change I would make is to surround the chapters with h1 or h2 tags instead of h2 and h3 because h1 and h2 are detected by the default XPath that way you won't have to make your own.

07-03-2009, 10:42 AM	#7
kovidgoyal creator of calibre Posts: 43,912 Karma: 22669818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Use the --debug-input switch to ebook-convert to "dump to OEB" rather than converting to OEB

07-05-2009, 11:33 AM	#8
lightkeeper54 Junior Member Posts: 2 Karma: 10 Join Date: Mar 2009 Device: Acer Aspire One	Kovid, I'm confused. Where do I specify a command-line switch on Windows without running a command prompt?

07-05-2009, 08:56 PM	#10
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	Yes, that does require the command line. Before the betas using a debug command wasn't neccessary though, I just needed to specify a .oeb extension with ebook-convert and I would get an uncompressed folder with OEB. Has that changed then? My general problem with using the command line is I've adopted the GUI for my work, and having to navigate Calibre's folder structure using the command line makes it a bit difficult to switch between the two.

07-05-2009, 09:28 PM	#11
kovidgoyal creator of calibre Posts: 43,912 Karma: 22669818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	The difference between .oeb and --debug-input is that --debug-inpu aborts the conversion pipeline immediately after the input plugin is run, whereas .oeb outputs the end result of running the full conversion pipeline

Advert

Advert