Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 12-18-2019, 05:07 PM   #1
aap
Member
aap began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Dec 2019
Device: HTML
Table Of Contents Beyond TOC3

I am using Calibre 4.6.0 to convert a Word .docx file to .html for a web-based user manual.

The conversion path is DOCX > HTMLZ > ZIP.

The source .docx file has a table of contents built by Word with four levels extracted from Word's Heading1 - Heading4 styles:
eg:

1 This is a First Level Chapter Title
1.1 This is a Second Level Chapter Title
1.1.1 This is a Third Level Chapter Title
1.1.1.1 This is a Fourth Level Chapter Title

Calibre does a good job of performing the conversion using the TOC from the Word document source and correctly extracts the four levels into the TOC and hyperlinks to the target headings in the text body. A couple of minor .css tweaks corrects for some text misalignment issues at the target headings.

One issue: The fourth level target does not show the chapter level at the target. Example:

1.1.1.1 This is a Fourth Level Chapter Title

shows correctly in the TOC, but without the chapter id of 1.1.1.1 at the target text.

TOC1-TOC3 show the chapter level correctly in the TOC and at the target text.

How to configure Calibre to show the chapter ID for TOC level 4 ?
aap is offline   Reply With Quote
Old 12-18-2019, 10:54 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
ToC entry text comes from whatever is included in the toc entry in the docx file, it has nothing to do with levels.
kovidgoyal is offline   Reply With Quote
Advert
Old 12-19-2019, 09:23 AM   #3
aap
Member
aap began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Dec 2019
Device: HTML
OK.

Attached are screenshots from a simplified four level docx file.

The first shows the Table of Contents that Word generates from the body text which follows. The body text are "Heading 1" - "Heading 4" styles.

The results files show the .html and EPUB output from Calibre. In the body text, the numbering on the level 4 heading is missing and the margin offset for the Level 3 title is incorrect.

The .html shows that the first three body text headings are generated with div's but the level 4 heading is generated as an H4 tag.

So the Table of Contents generated by Calibre is correct, but the body text does not follow the docx source.

How to resolve this ?
Attached Thumbnails
Click image for larger version

Name:	word source.png
Views:	214
Size:	21.7 KB
ID:	175721   Click image for larger version

Name:	calibre html output.png
Views:	288
Size:	25.8 KB
ID:	175722   Click image for larger version

Name:	calibre epub output.png
Views:	221
Size:	25.8 KB
ID:	175723  
aap is offline   Reply With Quote
Old 12-19-2019, 11:41 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
https://www.mobileread.com/forums/sh...d.php?t=186697
kovidgoyal is offline   Reply With Quote
Old 12-19-2019, 01:31 PM   #5
aap
Member
aap began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Dec 2019
Device: HTML
As requested, attached is a zip archive with the source docx file, Calibre log files from DOCX>HTMLZ, DOCX>EPUB, HTMLZ>ZIP conversions and the Calibre conversion results files.
Attached Files
File Type: zip test.zip (94.8 KB, 185 views)
aap is offline   Reply With Quote
Advert
Old 12-19-2019, 02:52 PM   #6
aap
Member
aap began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Dec 2019
Device: HTML
As a point of reference, the HTML as generated by Word for the test.docx file.

Screenshot from Firefox and the generated HTML.
Attached Thumbnails
Click image for larger version

Name:	ExcelWeb.png
Views:	192
Size:	43.2 KB
ID:	175728  
Attached Files
File Type: zip ExcelWeb.zip (23.5 KB, 178 views)
aap is offline   Reply With Quote
Old 12-20-2019, 04:10 AM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
That file has really weird markup, how are you generating it?

Open the docx file in word, right click on heading 3 style, go to numbering and then change list level and you will see there is a huge indent on that style. That is why you see the indent in the conversion output. Fix that and you will be fine.

As for heading 4 not getting numbering it is because its numbering style inherit the lvlid from a parent style, something I have never seen and didn't know was possible, but anyway the next release of calibre will handle that.
kovidgoyal is offline   Reply With Quote
Old 12-20-2019, 08:10 AM   #8
aap
Member
aap began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Dec 2019
Device: HTML
Thank for the clarifications. I will look into this.

The HTML files contained in the file ExcelWeb.zip were generated via Word "Save As HTML". There are two formats: Unfiltered and Filtered. The one that was sent was the Unfiltered format. This format has all the funky conditionals encoded in it. Really horrendous markup and over 200KB !

Attached is the generated file in the "Filtered" format and a screenshot of the results via Firefox.

The end result is the same as Unfiltered with all headings in the body area displayed correctly.
Attached Thumbnails
Click image for larger version

Name:	testfiltered.png
Views:	199
Size:	42.5 KB
ID:	175744  
Attached Files
File Type: zip testfiltered.htm.zip (4.6 KB, 177 views)
aap is offline   Reply With Quote
Old 12-20-2019, 08:42 AM   #9
aap
Member
aap began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Dec 2019
Device: HTML
Thanks, I will look into the Heading3 style changes.

The HTML was generated via Word's "Save as HTML". There are two variants:

Unfiltered: This has the funky mso markup. This is the format that was sent with the horrendous markup.

Filtered: This is a relatively clean HTML version.

Attached is the output in the filtered format and a screenshot of via Firefox.

The results are the same with the body headings rendered correctly.
Attached Thumbnails
Click image for larger version

Name:	testfiltered.png
Views:	195
Size:	42.5 KB
ID:	175747  
Attached Files
File Type: zip testfiltered.htm.zip (4.6 KB, 191 views)
aap is offline   Reply With Quote
Old 12-20-2019, 10:57 AM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
No I meant how are you generating the docx file.
kovidgoyal is offline   Reply With Quote
Old 12-20-2019, 11:34 AM   #11
aap
Member
aap began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Dec 2019
Device: HTML
docx file composed under Microsoft Office 2019 Word 2019 (version 16.0.12228.20100 - 32 bit)
Windows 10
aap is offline   Reply With Quote
Old 12-20-2019, 04:05 PM   #12
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,723
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
@aap - the zips you posted have MacOS artefacts, are you using calibre on MacOS or Windows? Although, it should not make any difference.

Can you post the DOCX you save from Windows MS Word, you'll need to put it in a zip to post it here, or upload it to dropbox/onedrive/wherever and post the link.

BR
BetterRed is online now   Reply With Quote
Old 12-20-2019, 04:23 PM   #13
aap
Member
aap began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Dec 2019
Device: HTML
BetterRed, sorry about that.

I work in mostly an OSX environment (High Sierra) but with various MS Windows and Windows Office versions under Parallels Desktop.

The Word and Calibre work is done under Windows 10. I might have used the OSX compress utility to create the .zip file in some of the previous postings.

Attached is the .docx file compressed under Windows 10.
Attached Files
File Type: zip test.zip (69.4 KB, 170 views)
aap is offline   Reply With Quote
Old 12-20-2019, 05:10 PM   #14
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,723
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
I printed all the styles, see attached PDF. It's a very long list because of all those xnnn styles that I suspect come from Excel.

One thing I notice is that Heading styles 1,2 and 3 are based on Normal whilst Heading style 4 is based on Heading style 3. The TOC styles are similar, some are based on TOC 1 and others on Normal.

Which leads me to me ask - why?

BR
Attached Files
File Type: pdf styles.pdf (241.3 KB, 178 views)
BetterRed is online now   Reply With Quote
Old 12-20-2019, 06:04 PM   #15
aap
Member
aap began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Dec 2019
Device: HTML
I'm not sure can explain definitively why that is the case.

The origins of this particular file goes back several years and has been edited via several versions of MS Word, both on Mac and PC. The basic chapter framework and automatic TOC has remained the same at four numbered levels. The file is now over 150 pages.

I recall that the original chapter heading and TOC styles were "standard" Word styles. However, it is possible that the styles got modified/corrupted over time.

That said, the Heading and TOC elements do get displayed correctly in Word on-screen, printed, and via Word's HTML generation capability.

I prefer the HTML output from Calibre (as opposed to Word) as it is much cleaner and easier to fold into my HTML5 and CSS3 page templates. It's a trivial hack of the Calibre css to get the headings to align properly, but the lack of the Heading4 numbering is a formidable editing task.

Any ideas on how to modify the source Word document and/or styles for a 4 level chapter and TOC would be appreciated.
aap is offline   Reply With Quote
Reply

Tags
"table of contents", docx input, html conversion


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Table of Contents not being identified as Table of Contents openletter Conversion 2 10-19-2012 12:54 AM
Table of Contents fiona86 Conversion 1 08-11-2011 07:14 AM
Table of Contents ucoa Calibre 1 01-07-2011 09:01 PM
How to: table of contents wizzofoz Sigil 1 10-08-2009 08:22 AM
only the table of contents wang960 Sony Reader 3 08-29-2008 12:45 PM


All times are GMT -4. The time now is 07:00 PM.


MobileRead.com is a privately owned, operated and funded community.