MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Development (https://www.mobileread.com/forums/forumdisplay.php?f=240)
-   -   HTMLZ Specs (https://www.mobileread.com/forums/showthread.php?t=241414)

pauloney 06-21-2014 08:35 AM

HTMLZ Specs
 
Can someone here point me to the HTMLZ Specification documents ? I have been looking around and have not been able to find anything ....

Paulo Ney

aleyx 06-21-2014 09:21 AM

Aren't those just zipped .html files?

kovidgoyal 06-21-2014 10:30 AM

HTMLZ is just zipped up html (essentially HTML with all its referenced images, stylesheets, etc in one bundle).

pauloney 06-21-2014 11:50 AM

Kovid,

I understand that .... but these answers seem a bit on the simplistic side! :)

No metadata files ?

What about covers ? Can they be anything ? Are they stand-alone or have to referenced by the HTML ?

Paulo Ney

pauloney 06-21-2014 12:33 PM

I talked to John Schember and it seems that the best description of the format is:

Required:
- Single Zip archive containing a single HTML file required at top-level, any name.

Optional:
- Single optional OPF metadata file at top level - any name. Structure for OPF file is almost the same as ePub2.
* opf is allowed "metadata" section.
* opf is allowed "guide" section but only allowed 1 cover reference.
* No other guide features are supported.
- Cover:
* Cover image is allowed and if present must be referenced by the opf.
* Does not have to be referenced by the html file.
* Can be in any location. The opf will point to where it is located.
* Can be any image format (jpg is recommended).
* Can have any filename (cover is recommended).
- CSS, images and any other support files are allowed, and can be in
any location (top level or in sub directories). The html just has
to reference them by the relative path if they're in a sub directory.
- It can also place class based CSS inside of the head element in the HTML
itself, or write the CSS inline within each element.


Default structure for Calibre is:
index.html
metadata.opf
cover.jpg
style.css
images/

If anything here is not in accord with what you think, please let me know.

Paulo Ney

kovidgoyal 06-21-2014 02:20 PM

HTMLZ unlike EPUB is not pointlessly fussy. It will accept a far wider variety of OPFs than your typical non-calibre epub consuming application. So if you are familiar with the epub spec use the OPF part of it to guide yourself in creating the opf for htmlz. Any OPF that works in epub2 will work in HTMLZ. Name the opf anything you like and put it in the root. And I think jpeg, png, gif, bmp will all work for covers.

Although I am a little confused by your use case. As far as I know calibre is the only application that consumes HTMLZ, so why would you want to create an HTMLZ outside it?

pauloney 06-21-2014 06:11 PM

Right now the format is extremely important because it is the only path to go from CHM to LaTeX... going from CHM to HTMLZ with Calibre and then using Pandoc to get from HTML (single file) to LaTeX.

At the moment one has to unpack the produced HTMLZ by hand and then proceed with the Pandoc translation because Pandoc only understands HTML, but we are in the process of writing an HTMLZ reader for it and then the translation process should be more automatic.

That is why it is so important to get the "specs" set down.

The format is starting to spread, check:

https://cloudconvert.org/htmlz-to-epub

or maybe this is powered by Calibre ...

Paulo Ney

kovidgoyal 06-22-2014 12:52 AM

It is powered by calibre.

And note that EPUB is also just zipped HTML. Not to mention that calibre can convert to an "exploded EPUB", like this

ebook-convert file.chm oeb

oeb will then be a directory containing HTML + OPF

The only advantage of HTMLZ is that the HTML is all concatenated into a single file.

nqk 02-17-2021 01:18 AM

It seems that the browser viewer can read htmlz, but I don't know how I can prepare a table of contents. The toc.ncx file is available and referred to in the content.opf but the browser viewer doesn't show it. All other things are ok (styles, fonts, resources, etc.)

Edit: Maybe I will try nav.xhtml

kovidgoyal 02-17-2021 06:01 AM

Does the ToC work for HTMLz with the desktop viewer? As far as I recall HTMLZ doesnt have support for tocs though I may be misremembering.

nqk 02-17-2021 06:14 AM

Quote:

Originally Posted by kovidgoyal (Post 4094261)
Does the ToC work for HTMLz with the desktop viewer? As far as I recall HTMLZ doesnt have support for tocs though I may be misremembering.

No, it doesn't work. I tried converting to HTMLZ to see how Calibre handles the ToC but it's not included in the output. It's how HTMLZ works or it's a bug, I can't say.

kovidgoyal 02-17-2021 06:32 AM

how htmlz works is pretty much defined by how it works in calibre, since it has no existence outside of calibre.

The_book 02-17-2021 08:05 AM

Quote:

Originally Posted by pauloney (Post 2856582)
Right now the format is extremely important because it is the only path to go from CHM to LaTeX... going from CHM to HTMLZ with Calibre and then using Pandoc to get from HTML (single file) to LaTeX.

At the moment one has to unpack the produced HTMLZ by hand and then proceed with the Pandoc translation because Pandoc only understands HTML, but we are in the process of writing an HTMLZ reader for it and then the translation process should be more automatic.

That is why it is so important to get the "specs" set down.

The format is starting to spread, check:

https://cloudconvert.org/htmlz-to-epub

or maybe this is powered by Calibre ...

Paulo Ney

If what I understand of chm file is not worng, why not just unzip the chm file with applications like 7zip or even just hh.exe and then deal with the file in it?

nqk 02-17-2021 08:38 AM

Quote:

Originally Posted by kovidgoyal (Post 4094274)
how htmlz works is pretty much defined by how it works in calibre, since it has no existence outside of calibre.

If that is the case, please support table of contents. I think it is a great format as a source to convert to other formats if needed. Editing one single file is much easier than digging in a bunch, especially if you are not using desktop environment.

What I tried was zip up the input folder (debug mode) and rename to HTMLZ. I took the ncx and nav.html file too (even the whole opf file), everything worked, except the TOC.

And it is great for landscape paged mode, as chapters are not separated by blank pages most of the time. (Multiple columns).

kovidgoyal 02-17-2021 09:52 AM

I'm afraid I dont know anything about the htmlz format, it was contributed to calibre by user_none many years ago and since then has basically just sat there.


All times are GMT -4. The time now is 05:50 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.