MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Calibre (https://www.mobileread.com/forums/forumdisplay.php?f=166)
-   -   HTML Entities placed in ToC break Kobo Aura (https://www.mobileread.com/forums/showthread.php?t=281622)

trekky0623 12-11-2016 03:03 PM

HTML Entities placed in ToC break Kobo Aura
 
This took me a couple hours to figure out, but it seems to be reproducible and fixable now.

I have a certain ePub with filenames with commas in them. For example, TaleofTwoCities,A_split_000.html, something like that. If I transfer it to my Kobo Aura without converting in Calibre, everything works as expected regarding the Table of Contents and chapters and it displays the chapter title at the bottom of the screen.

However, if I convert the ePub > ePub with Calibre, the toc.ncx file lists these files as:

Code:

<content src="TaleofTwoCities%2cA_split_000.html"/>
It replaces the comma with an HTML entity in the toc.ncx file but not the filename itself. This is not the case in the original ePub. This seems to break chapter handling on the Kobo in some ways, including not showing the chapter title at the bottom of the screen as well as flipping to the wrong position when finishing a chapter.

Is there a way to prevent Calibre from putting in these HTML entities?

kovidgoyal 12-11-2016 03:46 PM

That isn't an HTML entity it is URL encoding and is IIRC perfectly legal in ncx. You can always prevent it from happening by renaming the html file in the calibre editor to remove the comma from the name. If you open a bug report and attach the original epub file, I'll look into getting the conversion to unquote URLs in the ncx.

BetterRed 12-11-2016 04:09 PM

@trekky0623 - see ==>> How do I report a bug?

BR

trekky0623 12-11-2016 08:56 PM

1 Attachment(s)
I don't want to post copyrighted material, so I made a test file to demonstrate the issue. It has html files labeled "TestFile,A-01.html", etc.

When loaded onto the Kobo Aura as is, it should show Chapter One, Chapter Two, and Chapter Three at the bottom of the page for the chapter title.

If you then remove the book from the Kobo, convert it with Calibre ePub > ePub, then add the book back, it will no longer show the chapter titles at the bottom of the screen.

Remove the file again, and edit the toc.ncx file to replace %2c with , and add the file back to the Kobo. The chapter titles will reappear.

Now, to be fair, ePub validator does complain about these commas, and the book should probably not have commas in the filenames. However, I feel that Calibre is still breaking functionality here by replacing characters with URL codes in filenames listed in the toc.ncx file. If the publisher is putting commas in their filenames, maybe it should be left alone?

kovidgoyal 12-12-2016 12:01 AM

Once again, URL encoding is perfectly legal in NCX files. That the Kobo does not support it is a bug in the Kobo. Despite that, being the nice guy that I am, I am willing to investigate changing calibre to workaround the bug in the Kobo -- there are already dozens of workarounds for device specific bugs in calibre's conversion pipeline. But, lets not mistake where the bug is.

trekky0623 12-12-2016 11:26 AM

Quote:

Originally Posted by kovidgoyal (Post 3441583)
Once again, URL encoding is perfectly legal in NCX files. That the Kobo does not support it is a bug in the Kobo. Despite that, being the nice guy that I am, I am willing to investigate changing calibre to workaround the bug in the Kobo -- there are already dozens of workarounds for device specific bugs in calibre's conversion pipeline. But, lets not mistake where the bug is.

That's alright. I understand, and it is a pretty nasty bug in the Kobo. I called them to report the flaw.

PeterT 12-12-2016 12:32 PM

I thought it was recommended not to use any special characters in the file names; just a-z A-Z and 0-9

BetterRed 12-12-2016 06:13 PM

Quote:

Originally Posted by PeterT (Post 3441850)
I thought it was recommended not to use any special characters in the file names; just a-z A-Z and 0-9

But as some would have it, "... it is a custom more honour'd in the breach than the observance."

Code:

D:\CalibreLibraries\_Test\William Shakespeare\Romeo et Juliette (401)\Romeo et Juliette - William Shakespeare.pdf
I've been wondering what a Kobo device does if a URI within an ncx file has an encoded space (%20) in it. Does it deal with them OK? If so then why not an encoded comma (%2C). They're both specified as encoding candidates in RFCs dated as long ago as August 1998.

FWIW: - according to the IETF, underscore, hyphen/minus, full stop, and tilde are acceptable in URI names.

BR

<rant>Why is it that in the content.opf, the manifest and spine refer to the XHTML files by their 'physical' file names, whereas in the toc.ncx the same files are referred to by their 'percent encoded' URI names. Inconsistencies such as this drives those of us not steeped in the intricacies of 'current technology' nuts. I sometimes wonder if the TPTB do it to feed their love of obscurantism.</rant>

davidfor 12-12-2016 07:00 PM

Quote:

Originally Posted by trekky0623 (Post 3441820)
That's alright. I understand, and it is a pretty nasty bug in the Kobo. I called them to report the flaw.

Did you mention that you are sending the books as kepubs? From your first post, you are either converting to kepub or using the KoboTouchExtended driver. I'm guessing the latter. If I do an epub-to-epub conversion and send the book as an epub, the book worked OK. When I converted to kepub and sent that, I see some of the problems you reported.

JSWolf 12-12-2016 07:49 PM

I too converted the attached ePub and using RMDSK (ADE) on my H2O, it worked. I did not try as kepub via Access.

I agree you need to give more details. This issue would be best served during the conversion from ePub to kepub which means it is not a Calibre issue.

trekky0623 12-16-2016 05:11 PM

Quote:

Originally Posted by davidfor (Post 3442045)
Did you mention that you are sending the books as kepubs? From your first post, you are either converting to kepub or using the KoboTouchExtended driver. I'm guessing the latter. If I do an epub-to-epub conversion and send the book as an epub, the book worked OK. When I converted to kepub and sent that, I see some of the problems you reported.

Well, the chapter information at the bottom doesn't work with ePubs anyway. I'm not sure about the navigation problems. But yes, I did tell them this was for kepubs. I currently have an open ticket and just E-mailed them some more info. I'll update this thread if anything comes of it, but as is, I think this is important knowledge to have in case anyone else runs into this issue with their Kobo.

GeoffR 12-16-2016 05:22 PM

Quote:

Originally Posted by trekky0623 (Post 3444120)
Well, the chapter information at the bottom doesn't work with ePubs anyway. I'm not sure about the navigation problems.

The Chapter title is nomally displayed in the Adobe ePub reader when you open the <-> menu, provided the "Display progress fo:" option is set to "Current chapter" and not "Whole book" (same as for the KePub reader.)

Edit: If the problem affects ePubs then it is something Kobo would have to fix in the device firmware, but if it only affects Kobo's proprietry KePub format then they might fix it by adding a requirement to their publishing guidelines that the NCX toc must not contain those html entities, or by removing the html entities when they convert the publisher's ePub into KePub format.


All times are GMT -4. The time now is 10:56 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.