HTML Entities placed in ToC break Kobo Aura

trekky0623 · 12-11-2016, 02:03 PM

This took me a couple hours to figure out, but it seems to be reproducible and fixable now.

I have a certain ePub with filenames with commas in them. For example, TaleofTwoCities,A_split_000.html, something like that. If I transfer it to my Kobo Aura without converting in Calibre, everything works as expected regarding the Table of Contents and chapters and it displays the chapter title at the bottom of the screen.

However, if I convert the ePub > ePub with Calibre, the toc.ncx file lists these files as:

Code:

<content src="TaleofTwoCities%2cA_split_000.html"/>

It replaces the comma with an HTML entity in the toc.ncx file but not the filename itself. This is not the case in the original ePub. This seems to break chapter handling on the Kobo in some ways, including not showing the chapter title at the bottom of the screen as well as flipping to the wrong position when finishing a chapter.

Is there a way to prevent Calibre from putting in these HTML entities?

kovidgoyal · 12-11-2016, 02:46 PM

That isn't an HTML entity it is URL encoding and is IIRC perfectly legal in ncx. You can always prevent it from happening by renaming the html file in the calibre editor to remove the comma from the name. If you open a bug report and attach the original epub file, I'll look into getting the conversion to unquote URLs in the ncx.

BetterRed · 12-11-2016, 03:09 PM

@trekky0623 - see ==>> How do I report a bug?

BR

trekky0623 · 12-11-2016, 07:56 PM

I don't want to post copyrighted material, so I made a test file to demonstrate the issue. It has html files labeled "TestFile,A-01.html", etc.

When loaded onto the Kobo Aura as is, it should show Chapter One, Chapter Two, and Chapter Three at the bottom of the page for the chapter title.

If you then remove the book from the Kobo, convert it with Calibre ePub > ePub, then add the book back, it will no longer show the chapter titles at the bottom of the screen.

Remove the file again, and edit the toc.ncx file to replace %2c with , and add the file back to the Kobo. The chapter titles will reappear.

Now, to be fair, ePub validator does complain about these commas, and the book should probably not have commas in the filenames. However, I feel that Calibre is still breaking functionality here by replacing characters with URL codes in filenames listed in the toc.ncx file. If the publisher is putting commas in their filenames, maybe it should be left alone?

kovidgoyal · 12-11-2016, 11:01 PM

Once again, URL encoding is perfectly legal in NCX files. That the Kobo does not support it is a bug in the Kobo. Despite that, being the nice guy that I am, I am willing to investigate changing calibre to workaround the bug in the Kobo -- there are already dozens of workarounds for device specific bugs in calibre's conversion pipeline. But, lets not mistake where the bug is.

trekky0623 · 12-12-2016, 10:26 AM

Quote:

Originally Posted by kovidgoyal

Once again, URL encoding is perfectly legal in NCX files. That the Kobo does not support it is a bug in the Kobo. Despite that, being the nice guy that I am, I am willing to investigate changing calibre to workaround the bug in the Kobo -- there are already dozens of workarounds for device specific bugs in calibre's conversion pipeline. But, lets not mistake where the bug is.

That's alright. I understand, and it is a pretty nasty bug in the Kobo. I called them to report the flaw.

PeterT · 12-12-2016, 11:32 AM

I thought it was recommended not to use any special characters in the file names; just a-z A-Z and 0-9

BetterRed · 12-12-2016, 05:13 PM

Quote:

Originally Posted by PeterT

I thought it was recommended not to use any special characters in the file names; just a-z A-Z and 0-9

But as some would have it, "... it is a custom more honour'd in the breach than the observance."

Code:

D:\CalibreLibraries\_Test\William Shakespeare\Romeo et Juliette (401)\Romeo et Juliette - William Shakespeare.pdf

I've been wondering what a Kobo device does if a URI within an ncx file has an encoded space (%20) in it. Does it deal with them OK? If so then why not an encoded comma (%2C). They're both specified as encoding candidates in RFCs dated as long ago as August 1998.

FWIW: - according to the IETF, underscore, hyphen/minus, full stop, and tilde are acceptable in URI names.

BR

<rant>Why is it that in the content.opf, the manifest and spine refer to the XHTML files by their 'physical' file names, whereas in the toc.ncx the same files are referred to by their 'percent encoded' URI names. Inconsistencies such as this drives those of us not steeped in the intricacies of 'current technology' nuts. I sometimes wonder if the TPTB do it to feed their love of obscurantism.</rant>

davidfor · 12-12-2016, 06:00 PM

Quote:

Originally Posted by trekky0623

That's alright. I understand, and it is a pretty nasty bug in the Kobo. I called them to report the flaw.

Did you mention that you are sending the books as kepubs? From your first post, you are either converting to kepub or using the KoboTouchExtended driver. I'm guessing the latter. If I do an epub-to-epub conversion and send the book as an epub, the book worked OK. When I converted to kepub and sent that, I see some of the problems you reported.

JSWolf · 12-12-2016, 06:49 PM

I too converted the attached ePub and using RMDSK (ADE) on my H2O, it worked. I did not try as kepub via Access.

I agree you need to give more details. This issue would be best served during the conversion from ePub to kepub which means it is not a Calibre issue.

trekky0623 · 12-16-2016, 04:11 PM

Quote:

Originally Posted by davidfor

Did you mention that you are sending the books as kepubs? From your first post, you are either converting to kepub or using the KoboTouchExtended driver. I'm guessing the latter. If I do an epub-to-epub conversion and send the book as an epub, the book worked OK. When I converted to kepub and sent that, I see some of the problems you reported.

Well, the chapter information at the bottom doesn't work with ePubs anyway. I'm not sure about the navigation problems. But yes, I did tell them this was for kepubs. I currently have an open ticket and just E-mailed them some more info. I'll update this thread if anything comes of it, but as is, I think this is important knowledge to have in case anyone else runs into this issue with their Kobo.

GeoffR · 12-16-2016, 04:22 PM

Quote:

Originally Posted by trekky0623

Well, the chapter information at the bottom doesn't work with ePubs anyway. I'm not sure about the navigation problems.

The Chapter title is nomally displayed in the Adobe ePub reader when you open the <-> menu, provided the "Display progress fo:" option is set to "Current chapter" and not "Whole book" (same as for the KePub reader.)

Edit: If the problem affects ePubs then it is something Kobo would have to fix in the device firmware, but if it only affects Kobo's proprietry KePub format then they might fix it by adding a requirement to their publishing guidelines that the NCX toc must not contain those html entities, or by removing the html entities when they convert the publisher's ePub into KePub format.

12-11-2016, 02:03 PM	#1
trekky0623 Member Posts: 20 Karma: 10 Join Date: Apr 2013 Device: Kindle Paperwhite	HTML Entities placed in ToC break Kobo Aura This took me a couple hours to figure out, but it seems to be reproducible and fixable now. I have a certain ePub with filenames with commas in them. For example, TaleofTwoCities,A_split_000.html, something like that. If I transfer it to my Kobo Aura without converting in Calibre, everything works as expected regarding the Table of Contents and chapters and it displays the chapter title at the bottom of the screen. However, if I convert the ePub > ePub with Calibre, the toc.ncx file lists these files as: Code: <content src="TaleofTwoCities%2cA_split_000.html"/> It replaces the comma with an HTML entity in the toc.ncx file but not the filename itself. This is not the case in the original ePub. This seems to break chapter handling on the Kobo in some ways, including not showing the chapter title at the bottom of the screen as well as flipping to the wrong position when finishing a chapter. Is there a way to prevent Calibre from putting in these HTML entities?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
More help needed. Automatic HTML entities conversion	arspr	Editor	3	12-31-2013 01:45 PM
How doI break a Kobo Aura out of an endlesss boot-loop?	RobertJSawyer	Kobo Developer's Corner	2	12-20-2013 11:35 AM
Search & Replace issue with html entities	Aleyst	Sigil	2	09-27-2011 07:49 AM
HTML entities being changed to actual glyphs	GrannyGrump	Sigil	4	09-10-2011 01:16 AM
Why do html entities get replaced upon import?	kentmatt	Calibre	1	12-08-2010 12:21 PM

12-11-2016, 02:46 PM	#2
kovidgoyal creator of calibre Posts: 45,253 Karma: 27110894 Join Date: Oct 2006 Location: Mumbai, India Device: Various	That isn't an HTML entity it is URL encoding and is IIRC perfectly legal in ncx. You can always prevent it from happening by renaming the html file in the calibre editor to remove the comma from the name. If you open a bug report and attach the original epub file, I'll look into getting the conversion to unquote URLs in the ncx.

12-11-2016, 03:09 PM	#3
BetterRed null operator (he/him) Posts: 21,662 Karma: 29711016 Join Date: Mar 2012 Location: Sydney Australia Device: none	@trekky0623 - see ==>> How do I report a bug? BR

12-11-2016, 11:01 PM	#5
kovidgoyal creator of calibre Posts: 45,253 Karma: 27110894 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Once again, URL encoding is perfectly legal in NCX files. That the Kobo does not support it is a bug in the Kobo. Despite that, being the nice guy that I am, I am willing to investigate changing calibre to workaround the bug in the Kobo -- there are already dozens of workarounds for device specific bugs in calibre's conversion pipeline. But, lets not mistake where the bug is.

12-12-2016, 11:32 AM	#7
PeterT Grand Sorcerer Posts: 13,380 Karma: 78877538 Join Date: Nov 2007 Location: Toronto Device: Libra H2O, Libra Colour	I thought it was recommended not to use any special characters in the file names; just a-z A-Z and 0-9

12-12-2016, 06:49 PM	#10
JSWolf Resident Curmudgeon Posts: 79,448 Karma: 145491800 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3	I too converted the attached ePub and using RMDSK (ADE) on my H2O, it worked. I did not try as kepub via Access. I agree you need to give more details. This issue would be best served during the conversion from ePub to kepub which means it is not a Calibre issue.

Advert

Advert