MobileRead Forums - View Single Post

kovidgoyal · 04-29-2015, 04:56 AM

Links in an azw3 file are in the form of byte offsets into the raw html. In the past these byte offsets have always pointed to tags that have an id attribute. So calibre would simply use that id attribute as the anchor when converting the byte offset based link into a normal html link. Your problem file had byte offsets that point to tags with no id attribute. In this case calibre would simply point to the file, with no anchor.

The assumption that tags pointed to by byte offsets will always have ids is reasonable, since typically azw3 files are created from epub/html, where links always use ids. However, in the case of your file, the file was presumably created in a way that did not require ids. The links could have been created, for example, using XPath expressions, or some such. So in this case, with my fix, calibre now generates a unique id for the tag, when one is missing.

Oh and I should mention, that the azw3 input plugin in calibre is largely based on KevinH's original work reverse engineering the azw3 format.

04-29-2015, 04:56 AM	#35
kovidgoyal creator of calibre Posts: 45,484 Karma: 28005164 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Links in an azw3 file are in the form of byte offsets into the raw html. In the past these byte offsets have always pointed to tags that have an id attribute. So calibre would simply use that id attribute as the anchor when converting the byte offset based link into a normal html link. Your problem file had byte offsets that point to tags with no id attribute. In this case calibre would simply point to the file, with no anchor. The assumption that tags pointed to by byte offsets will always have ids is reasonable, since typically azw3 files are created from epub/html, where links always use ids. However, in the case of your file, the file was presumably created in a way that did not require ids. The links could have been created, for example, using XPath expressions, or some such. So in this case, with my fix, calibre now generates a unique id for the tag, when one is missing. Oh and I should mention, that the azw3 input plugin in calibre is largely based on KevinH's original work reverse engineering the azw3 format.