metadata TOC from html

Streadmob · 12-19-2011, 10:20 AM

I was so delighted to discover Calibre! I’d just recovered from wasting half an evening with eCub :-( .

I’ve managed to get NeoOffice to compile my 2-level “content” TOC.

It’s in this format:

1.1 Thing at the Start
1.2 Whatever Comes Next
1.2.1 First bit of second Section
1.2.2 And Afterwards Here
1.3 This is the Third
1.4 And So On

(except NeoOffice puts a hefty indent in front of the 1.2.1, 1.2.2 (i.e. level 3 in my scheme) headings.)

I soon found out .odt files did not have the TOC recognised by Calibre so I got NeoOffice to output as .html (actually xhtml, or possibly just html since the file appeared with the extension .html).

The TOC is recognised and a metadata TOC is set up, but only the title. In other words:

1.1 Thing at the Start

When asking Calibre to convert from html to mobi, I do enter //h:h1 , //h:h2 , & //h:h3 in the TOC screen. I found this was necessary to get any metadata TOC to appear. (Actually I don’t use level 1 headings at the moment, only 2 & 3.) I’m using 50 & 6 as the top two parameters on that screen, and I don’t think I exceeded 6.

On the Structure Detection screen I’ve inserted or name()='h3' into the middle field to see if it helps but it makes no difference.

I can’t understand why it hasn’t been made easier to get Calibre to spot the “h2”s, “h3”s etc in the apt place in the html TOC and simply include them in the metadata TOC. What could be more elementary? All the conditions searching for content in the “Detect Chapters at (XPath expression)” on the Structure Detection screen could then be dropped.

If anyone has any idea how to get this to work I’d appreciate the advice.

Also, if anyone knows how to get NeoOffice to remove the stupid “1.” from the number at the start of each heading when using automatic numbering, making it numerate as:

1 Thing at the Start
2 Whatever Comes Next
2.1 First bit of second Section
2.2 And Afterwards Here
3 This is the Third
4 And So On

...I’d also appreciate it! This seems the sensible basic thing people would normally want but it seems necessary to code the numbers by hand to get it.

GMcG · 12-22-2011, 03:24 PM

I have read your post several times and now it seems to me, that you have a ToC, but it is without links to the following chapters of your book. Calibre needs links to analyze and build the structure of the eBook.

1. Check the links in your toc.html (either to separate files or to anchors)
2. 'Force use of auto-generated Table of Contents'. See 'preferences.jpg' below.
3. Start building your eBook with double click on your toc.html file.
It will be part of the book. If you click on another file you will only get the metadata toc of Calibre.

If
- the toc.html has no links,
- or you don't have a toc.html file,
- or the toc.html file is listed in the folder with your files at the wrong place, i.e. before the file you click on, then your eBook will consist of that file only.
One of these settings may be the reason for the result when clicking on '1.1 Thing at the Start' in your eBook.

Calibre works well with <h2> as toc.txt (html code) and toc2.jpg (Calibre viewer) show.
The chapter titles in Calibre's metadata toc are the <h2> chapter headlines of the html files (see 10.txt).
You may rename '.txt' to '.htm'.
I have made each chapter a separate file, because I don't like anchors.

Note:
- I use <p><br> to avoid the automatically indented first line in paragraphs in the toc and in first paragraphs in chapters.
- The headline of the Table of Contents is in <p> instead of <h2>, because I didn't want it selected by Calibre for the auto-generated toc again. 'Inhaltsverzeichnis' already means Table of Contents, so it would be doubled.
In stylesheet.css I have edited a new <p class> for it with font-size: 1.34em for the headline.
It's the same size I have used for <h2> in the chapters.
- The links are with: font-color: black; and: text-decoration: none; because I don't like the browser look in an eBook.

NeoOffice: I don't know it, but perhaps you could remove 1. with:
search (1.) and replace (nothing, leave empty).

Good luck!

George

Streadmob · 12-23-2011, 04:51 PM

Hi George –

Thanks for getting back!

Looking through your reply, I started on:
“1. Check the links in your toc.html (either to separate files or to anchors)”

I don’t have a toc.html file. I didn’t have one when I was working a couple of days ago, but I still at least managed to get a single line table of contents. As I understood it, all you needed was a single html file exported from your wordprocessor. However, today I checked for h2 and h3 in the html file that I used before. h2 was there (and this became the single line of the table of contents) but h3 wasn’t (except for an introductory mention covering about the top six levels).

I suspected it was because I had been experimenting with heading numbering etc, so I restarted from fresh, and this time the html file DID include h3 mentions, e.g.:

<h3 class="Heading_20_3"><a name="2.1 First bit of second Section"><span/></a>2.1 First bit of second Section</h3>

I then carefully logged all my actions throughout the Calibre process:

Structure Detection window:

Detect chapters at:
I added mentions for h3 in case it made any difference:

//*[((name()='h1' or name()='h2' or name()='h3') and re:test(., '\s*((chapter|book|section|part)\s+)|((prolog|prol ogue|epilogue)(\s+|$))', 'i')) or @class = 'chapter']

Insert page breaks before (XPath expression):
Again, I added mentions for h3 in case it made any difference:

//*[name()='h1' or name()='h2' or name()='h3']

Table of Contents:

Force use of auto-generated table of contents box checked.

Level 1 TOC (XPath expression)
//h:h1

Level 2 TOC (XPath expression)
//h:h2

Level 3 TOC (XPath expression)
//h:h3

Result:
Although the correct Table of Contents is inserted at the start of the text of the book, as displayed by Kindle Reader for Mac and Kindle Previewer, no table of contents appears now (a couple of days before, at least I got the one-line table in both programs, so I’ve gone backwards).

(What gets me is that the production of the TOC as a separate entity should be the DEFAULT. And when you have requested a separate TOC, the program should DETECT the point at which its production failed, REPORT to the user where and why it failed, and suggest changes that would be SUFFICIENT to produce a table, according to the specific conditions :-( .)

Thanks for your later advice, George. The first part of it refers to toc.[...] files, which I didin’t think I needed to input to Calibre, and weren’t needed a couple of days ago for a (one line) TOC. I’ll study the other bits when I’ve got the TOC working.

If you or anyone else can tell from my log what else was needed, I’d be very appreciative, and probably also produce a detailed guide as to how to get the thing to work from NeoOffice.

Cheers!

GMcG · 12-24-2011, 02:21 PM

There was a misunderstanding when you said in your first post, that you had exported the toc as an html file.
And you didn't say that you work on mobi. I prefer epub because I can tweak it afterwards. I don't know how to do it with mobi.

This time I have made all files one and tried it without any links or anchors and it works too.

The problem seems to be that in the preferences there is an interdependency between:
a) Table of contents / 'Force use of auto-generated Table of contents'
and
b) Mobi Output / 'Do not add Table of Contents to book' as default.
You chose MOBI at the upper right corner.

The last must be inactivated and then you get
1. the auto-generated Table of Contents of Calibre and
2. a Table of Contents added at the end of the book with links to the chapters.

It looks horrible but at least it works.
Now you may try it with your odt files.
Maybe you will get more help if you ask at another place in the forums.

Merry Christmas!

George

Streadmob · 12-24-2011, 03:42 PM

Thanks again George!

Sorry for the misunderstandings. It looks like I should get Calibre to output the first time in some format other than MOBI; then perhaps, reconvert to MOBI.

I've run the .mobi file you sent me and it does work, as you say.

Earlier in the day I tried another approach. I found sealwyf's website which offered a complete book, in a zipped folder of an html file, a cover, and a couple of simple pictures:

http://sealwyf.wikispaces.com/Kindle...3b8c376529f254

I downloaded the td3.zip file from the Presentation Files section of the page (on about the 2nd 'screen' down) and converted each piece as required into the details of my book. It looks like that system could be used to set the book up, but I will not be able to resist trying out your method, so I can use .doc files directly, and compare which is best. I strongly recommend this to anyone trying to set up a mobi file as their input to Calibre, as I think it is the easiest to understand, and avoids the complications of

Thanks all the same and Happy Christmas! (whenever it is for you :-) )

Cheers. (Very U-boatish avatar! :-) )

12-19-2011, 10:20 AM	#1
Streadmob Member Posts: 17 Karma: 10 Join Date: Dec 2011 Device: none	metadata TOC from html I was so delighted to discover Calibre! I’d just recovered from wasting half an evening with eCub :-( . I’ve managed to get NeoOffice to compile my 2-level “content” TOC. It’s in this format: 1.1 Thing at the Start 1.2 Whatever Comes Next 1.2.1 First bit of second Section 1.2.2 And Afterwards Here 1.3 This is the Third 1.4 And So On (except NeoOffice puts a hefty indent in front of the 1.2.1, 1.2.2 (i.e. level 3 in my scheme) headings.) I soon found out .odt files did not have the TOC recognised by Calibre so I got NeoOffice to output as .html (actually xhtml, or possibly just html since the file appeared with the extension .html). The TOC is recognised and a metadata TOC is set up, but only the title. In other words: 1.1 Thing at the Start When asking Calibre to convert from html to mobi, I do enter //h:h1 , //h:h2 , & //h:h3 in the TOC screen. I found this was necessary to get any metadata TOC to appear. (Actually I don’t use level 1 headings at the moment, only 2 & 3.) I’m using 50 & 6 as the top two parameters on that screen, and I don’t think I exceeded 6. On the Structure Detection screen I’ve inserted or name()='h3' into the middle field to see if it helps but it makes no difference. I can’t understand why it hasn’t been made easier to get Calibre to spot the “h2”s, “h3”s etc in the apt place in the html TOC and simply include them in the metadata TOC. What could be more elementary? All the conditions searching for content in the “Detect Chapters at (XPath expression)” on the Structure Detection screen could then be dropped. If anyone has any idea how to get this to work I’d appreciate the advice. Also, if anyone knows how to get NeoOffice to remove the stupid “1.” from the number at the start of each heading when using automatic numbering, making it numerate as: 1 Thing at the Start 2 Whatever Comes Next 2.1 First bit of second Section 2.2 And Afterwards Here 3 This is the Third 4 And So On ...I’d also appreciate it! This seems the sensible basic thing people would normally want but it seems necessary to code the numbers by hand to get it.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
NCX From Html TOC	Unno	Kindle Formats	20	09-16-2011 10:31 AM
TOC Disappears after adding metadata..Help!	CanonFan	Sigil	5	05-23-2011 06:30 AM
Need help with html>epub ToC	Cthulhu Inc	Conversion	2	03-26-2011 05:18 PM
"metadata" (toc) in HTML documents	pedz	Calibre	8	03-30-2010 10:23 PM
HTML Book + non HTML TOC to epub	aarcane	Calibre	4	03-02-2010 03:58 AM

12-23-2011, 04:51 PM	#3
Streadmob Member Posts: 17 Karma: 10 Join Date: Dec 2011 Device: none	Hi George – Thanks for getting back! Looking through your reply, I started on: “1. Check the links in your toc.html (either to separate files or to anchors)” I don’t have a toc.html file. I didn’t have one when I was working a couple of days ago, but I still at least managed to get a single line table of contents. As I understood it, all you needed was a single html file exported from your wordprocessor. However, today I checked for h2 and h3 in the html file that I used before. h2 was there (and this became the single line of the table of contents) but h3 wasn’t (except for an introductory mention covering about the top six levels). I suspected it was because I had been experimenting with heading numbering etc, so I restarted from fresh, and this time the html file DID include h3 mentions, e.g.: <h3 class="Heading_20_3"><a name="2.1 First bit of second Section"><span/></a>2.1 First bit of second Section</h3> I then carefully logged all my actions throughout the Calibre process: Structure Detection window: Detect chapters at: I added mentions for h3 in case it made any difference: //[((name()='h1' or name()='h2' or name()='h3') and re:test(., '\s((chapter\|book\|section\|part)\s+)\|((prolog\|prol ogue\|epilogue)(\s+\|$))', 'i')) or @class = 'chapter'] Insert page breaks before (XPath expression): Again, I added mentions for h3 in case it made any difference: //*[name()='h1' or name()='h2' or name()='h3'] Table of Contents: Force use of auto-generated table of contents box checked. Level 1 TOC (XPath expression) //h:h1 Level 2 TOC (XPath expression) //h:h2 Level 3 TOC (XPath expression) //h:h3 Result: Although the correct Table of Contents is inserted at the start of the text of the book, as displayed by Kindle Reader for Mac and Kindle Previewer, no table of contents appears now (a couple of days before, at least I got the one-line table in both programs, so I’ve gone backwards). (What gets me is that the production of the TOC as a separate entity should be the DEFAULT. And when you have requested a separate TOC, the program should DETECT the point at which its production failed, REPORT to the user where and why it failed, and suggest changes that would be SUFFICIENT to produce a table, according to the specific conditions :-( .) Thanks for your later advice, George. The first part of it refers to toc.[...] files, which I didin’t think I needed to input to Calibre, and weren’t needed a couple of days ago for a (one line) TOC. I’ll study the other bits when I’ve got the TOC working. If you or anyone else can tell from my log what else was needed, I’d be very appreciative, and probably also produce a detailed guide as to how to get the thing to work from NeoOffice. Cheers!

12-24-2011, 03:42 PM	#5
Streadmob Member Posts: 17 Karma: 10 Join Date: Dec 2011 Device: none	Thanks again George! Sorry for the misunderstandings. It looks like I should get Calibre to output the first time in some format other than MOBI; then perhaps, reconvert to MOBI. I've run the .mobi file you sent me and it does work, as you say. Earlier in the day I tried another approach. I found sealwyf's website which offered a complete book, in a zipped folder of an html file, a cover, and a couple of simple pictures: http://sealwyf.wikispaces.com/Kindle...3b8c376529f254 I downloaded the td3.zip file from the Presentation Files section of the page (on about the 2nd 'screen' down) and converted each piece as required into the details of my book. It looks like that system could be used to set the book up, but I will not be able to resist trying out your method, so I can use .doc files directly, and compare which is best. I strongly recommend this to anyone trying to set up a mobi file as their input to Calibre, as I think it is the easiest to understand, and avoids the complications of Thanks all the same and Happy Christmas! (whenever it is for you :-) ) Cheers. (Very U-boatish avatar! :-) )

Advert

Advert