View Full Version : My first epub: Cover is on page 2 (?)


omk3
01-21-2010, 10:57 AM
I have now more or less completed my first-ever hand-crafted epub. I tried to do it properly, with clean code and manually editing everything that was needed. I used Jellby's "The Prince and the Pauper" as a guide for the parts I found most confusing (thanks Jellby!), and in the end I got it in a format I am happy with, with embedded fonts for titles, with margins and different indents for the first paragraph of a chapter and so on. Pretty simple stuff for all your experienced folks, but it was my first time with xhtml and css and I'm quite happy with my progress. I got my epub validated and it works and looks fine.

Except for the cover. The strange thing about the cover is, it is on page 2-3(!). I try to navigate to page 1, but it seems there is no such thing.

I shamelessly copied the script for the cover page, and I don't really understand it. It is like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

<head>
<title> cover </title>
</head>

<body>

<svg version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"
width="100%" height="100%" viewBox="0 0 600 800" preserveAspectRatio="xMidYMid meet">
<image width="600" height="800" xlink:href="images/cover.jpg" />
</svg>

</body>
</html>

As I understand it it's supposed to resize the image according to screen size, keeping aspect ratio. I removed the link to the style sheet from the cover page as it was not needed and added margins which I didn't want.
The cover appears with a small margin everywhere but on the bottom side, and as I said before, is on page 2-3 instead of page 1.

As everything works and looks okay it's not really a problem, it's just weird. Am I making a really obvious mistake that I can't see?

The image I used for the cover is 600x800 and has no white margins of its own.

Thanks for any comments!

zelda_pinwheel
01-21-2010, 11:03 AM
hm, very odd !

when you say it's on page 2-3, do you mean there is a blank page before the cover, when you open the book ? or simply that the page number displayed is 2-3 ?

in the toc.ncx, is the cover given the first playorder slot ? is there anything before it ?

omk3
01-21-2010, 11:17 AM
There is no page before that, it's just that the number displayed is 2-3 (or just 2 in Adobe Digital Editions)

I have not included the cover in the toc.ncx, but in content.opf it is first in the <spine toc="ncx">. Hmm...maybe I should have included it in the toc then? I'll try it now.

[EDIT] I added the cover to the toc, now my toc is more complete, but other than that nothing changed. It is still on page 2 (or 2-3 on the reader) and it still has margins top left and right but not bottom...

zelda_pinwheel
01-21-2010, 11:37 AM
add some css (you can add it directly in the page in the <head> element, instead of in an external stylesheet) to remove the margins around the cover :


<style type="text/css">

body {margin : 0;
padding : 0;}

</style>



i don't know why it's marked page 2, unless the margins are causing it to be larger than the page, and therefore it gets "bumped" to a second page... in which case, margin 0 might fix that.

omk3
01-21-2010, 11:56 AM
Brilliant about the margins, thanks! :thanks:

I should have thought of it myself instead of relying on the margins disappearing by themselves...
My excuse is that I thought the svg script took care of it (I don't even know what svg is :D) and that I have been looking at this specific epub for too long now... Maybe I need to clear my head a little.

Anyway, cover now looks beautiful, but is still on page 2-3 on the reader, and on Adobe Digital Editions it was page 2, then after some scrolling up and down it is on page 3! Nothing before it of course, it just refuses to be on the first page... On the reader I specifically ask "go to page 1" and it displays page 2-3 instead.

zelda_pinwheel
01-21-2010, 11:58 AM
:inquisiti weird. i'll think about it some more, and hopefully one of the epub Geniuses we've got around here will have an idea to solve this. i was hoping it was just a question of ADE thinking the image was too big and didn't fit on the first page... :rolleyes: at least you got the margins sorted. :)

(svg = Scalable Vector Graphic (http://en.wikipedia.org/wiki/Scalable_Vector_Graphics) ;))

omk3
01-21-2010, 07:06 PM
(svg = Scalable Vector Graphic (http://en.wikipedia.org/wiki/Scalable_Vector_Graphics) ;))

Oooh, right, now I get it! :2thumbsup

Cleared my head a little (with a few pints, not very effective) and looked at everything again, well at my cover.xhtml and the content.opf, but nothing stands out. Every other book I've opened starts on page 1. I guess I have to accept that my ebook will be unique :rolleyes:

omk3
01-21-2010, 08:09 PM
Hmmmm, I think I'm on to something...

(For what's following, I'm using the word "page" for the page numbers displayed on the bottom of the reader, and the word "screen" for each page turn on my reader, to avoid confusion)

Scrolling up and down my book in ADE in frustration, I noticed that every screen had two or three little page numbers on the side. That's a lot!

On my reader, starting from the beginning and turning the pages, I get:

page 2-3 for the cover
page 4 for the title page
Then comes the first short story with pages 5-6, 6-9, 9-11, 11.
So one screen spans 2 pages, the next 3.5, the next 2.5, and then only one (half actually) because it's end of chapter.
Another short story is so small it is actually only one paragraph, and is a little more than half a screen (small font). This is on pages 131-132.

So what I think is, this all has something to do with the way adobe calculates pages. My ebook is a collection of 32 short stories spanning 280 pages, and while some of them are normal short story length, some can be as short as a paragraph or half a page. I have every story in its own xtml, and most of them don't reach 10kb of size. I believe this results (in a way I don't quite understand) in pages of shorter length than usual, so that each screen displays 2-3 pages by default. That would explain my cover being 2-3 (though I would expect it to be 1-3 actually) and it would mean that there isn't much I can do about it.

Do you think I have come to the right conclusion? Has anyone had the same experience? I would guess that, if I am correct, poetry collections with small poems could have exactly the same behaviour.

Jellby
01-22-2010, 07:27 AM
I get usually around one and a half screens per page, and that's with a fairly small font size. I believe ADE calculates pages just based on the byte length of the file, so if you have lots of XHTML code, or use multi-byte unicode characters, you could get shorter pages... Can we have a look at your full book?

pdurrant
01-22-2010, 07:38 AM
So what I think is, this all has something to do with the way adobe calculates pages.

Exactly. I think Adobe use 1KB (or perhaps 2KB) per 'page'. There is a non-standard way to include a mapping between page numbers and the contents, so that you could specify that the cover was page 1, and the title page page 2, and the first story started on page 3. But I'm not sure it's worth doing.

omk3
01-22-2010, 07:44 AM
I get usually around one and a half screens per page, and that's with a fairly small font size. I believe ADE calculates pages just based on the byte length of the file, so if you have lots of XHTML code, or use multi-byte unicode characters, you could get shorter pages... Can we have a look at your full book?

It is actually a copyrighted book I had as a paper book and converted to ebook, so I could not really post it here. (I could send it to you personally if you really wanted to see it though, I guess)

The code in the xhtml files is the absolute minimum, link to css, then <h2>, <div> and <p> tags, and some tags for italics or specific formatting for poems (one or two), so there's not really a lot of code, no. I tried to keep it as clean as possible.

The book is in english, with only a few accented characters in the whole book and nothing else extraordinary, so no multi-byte unicode characters either. As I said, actually the xhtml files are really small, nine of them are in the 2-5 kb range (not including the title page which is smaller) , and the largest is 35kb. Actually only five of them are over 10kb.

omk3
01-22-2010, 07:56 AM
Exactly. I think Adobe use 1KB (or perhaps 2KB) per 'page'. There is a non-standard way to include a mapping between page numbers and the contents, so that you could specify that the cover was page 1, and the title page page 2, and the first story started on page 3. But I'm not sure it's worth doing.

I'm sure it's not worth it at all, it's just that I would expect the first page to at least include the number one. I wouldn't mind my first page being 1-3 (much), but 2-3 is just weird...

Pages actually do seem to change every 1kb. But "The Prince and the Pauper"'s cover is 3kb and is on page 1. My cover is 3kb and is on page 2-3. :blink:

Jellby
01-22-2010, 08:05 AM
Pages actually do seem to change every 1kb. But "The Prince and the Pauper"'s cover is 3kb and is on page 1. My cover is 3kb and is on page 2-3. :blink:

The size of the cover image should not matter, it's only the XHTML code and text that counts, not linked elements.

Could you create another sample file replacing copyrighted text with "lorem ipsum (http://www.lipsum.com/)" or any other dummy text? Just a couple of "chapters" would be enough.

omk3
01-22-2010, 08:24 AM
The size of the cover image should not matter, it's only the XHTML code and text that counts, not linked elements.
Yes, I was referring to the cover.xhtml. The cover.jpg is a lot larger than that.


Could you create another sample file replacing copyrighted text with "lorem ipsum (http://www.lipsum.com/)" or any other dummy text? Just a couple of "chapters" would be enough.
I was thinking about something like that. I need to find out how it works, and I'll get back to you :)

pdurrant
01-22-2010, 08:35 AM
Yes, I was referring to the cover.xhtml. The cover.jpg is a lot larger than that.

I've just take a look at your code. Take out all the commented out code -- it still gets counted in the sizes.

When I take all the comments out from your cover.html, I see the cover on page 1 and the title page on page 2.

omk3
01-22-2010, 08:47 AM
You are so right, thank you!!!! I removed the comments and it is now on page one (and it is only 1kb of size)
:thanks:

(also any other comments on my code or structure would be much appreciated - without taking up much of your time of course)

However, the mystery is not solved yet for me, because all the comments were actually copied from Jellby's cover of "The Prince and the Pauper", and this cover was on page one all right.... I really don't understand why they would behave differently. I had kept the commented code because I thought I might need it if I wanted to edit the cover, but I can see now that it is best to keep the cover.xhtml as small as possible. But...again...why did Jellby's cover work fine and mine didn't, when it was actually only a copy of his? Oh well...

omk3
01-22-2010, 10:23 AM
Okay, here is a dummy version of the book, with every small character replaced with x's (not very beautiful, but the best I could come up with to keep size and everything else as true to the original as possible). Scrambled the cover, and only included the first 4 stories.

I kept the comments on the cover page, so that you see it on page 2 or 3. If I remove the comments, it goes to page 1 as intended, as pdurrant said. But the question remains, why does this really happen?

So if you want to read a lot of x's, here it is :)

Comments on my code and structure are very welcome.

[EDIT] It was validated, and ADE displays it okay, put epubreader seems to have a problem with things like &xxxxx;which are remnants of apostrophes I would guess... This problem exists only for the dummy book of course, not the original one.

Jellby
01-22-2010, 11:58 AM
Stupidly enough (on the part of Adobe), it seems pagination depends on the compression level of the files! I've just uncompressed your book and recreated it in two versions:

A) No compression (level 0). The cover is in pages 2-3, and the book is 56 pages long.

B) Maximum compression (level 9). The cover is in page 1, and the book is 14 pages long.

This makes the "page" numbers even more arbitrary than before :rolleyes:

omk3
01-22-2010, 12:18 PM
Unbelievable! Thank you so much Jellby! I would never have guessed.

So Adobe uses the compressed size instead of the actual size of the files to determine pages.... What can I say...

What do you use for zipping, by the way? I just used UltimateZip with no compression, so that I would have no problems with the mimetype file. It does not seem to offer many levels of compression, just "fast", "normal", etc. I tried with normal compression and got the cover on page 1, and 56 pages of book...

So to anyone else having the same problem: Either keep your cover page <=1kb or compress! :eek:

:thanks: for answering what to me was a great mystery!

Was the rest of my code okay? I'd like to hear any corrections from you ePub pros! :)

zelda_pinwheel
01-22-2010, 12:22 PM
Stupidly enough (on the part of Adobe), it seems pagination depends on the compression level of the files! I've just uncompressed your book and recreated it in two versions:

A) No compression (level 0). The cover is in pages 2-3, and the book is 56 pages long.

B) Maximum compression (level 9). The cover is in page 1, and the book is 14 pages long.

This makes the "page" numbers even more arbitrary than before :rolleyes:

wow, that's... pretty crazy. :rolleyes: i'm glad it's all sorted out now, and i've learned a thing or two as well in this thread. the epub kings save the day again ! nicely done, jellby and pdurrant. :)

Jellby
01-22-2010, 12:39 PM
What do you use for zipping, by the way?

I use just the standard "zip" command in Linux, which is Info-ZIP (http://www.info-zip.org/), it seems.

Was the rest of my code okay? I'd like to hear any corrections from you ePub pros! :)

It seems you have your first paragraph wrapped in a <div> and the rest of the chapter in another <div>:

<div class="firstp">
<p>First paragraph of the story</p>
</div>
<div class="body">
<p>Rest of the story</p>
<p>...</p>
</div>

I don't think that's really useful, you could get the same with:

<p class="firstp">First paragraph of the story</p>
<p>Rest of the story</p>
<p>...</p>

which looks cleaner to me. And then, are you sure you meant "text-indent: 10%" in the first paragraph? Books usually have no indent in the first paragraph. Anyway, you could simply write:

p
{
margin-bottom: 0;
margin-top: 0;
text-indent: 1em;
}

p.firstp
{
text-indent: 10%;
}

omk3
01-22-2010, 12:52 PM
Thank you and you're right, the <div> tags were not really useful. I had doubts about them myself, but thought I might need them later in ways I could not predict. If p.anything overrules simple p, then obviously I don't need them at all. I also had some small poems in later stories, which need different handling too. I had them in <div class="poem"> tags, but I guess <p class="poem"> would suffice. What would I need a <div> tag for, then, I wonder... I'll probably find out with experience.

About the indents, I was not at all sure how to format the book, so to avoid experimenting well into next year, I just copied the formatting of the original paper book, which actually had deeper indents for the first paragraphs, along with the titles. It surprised me too, but it looked okay.

Jellby, Pdurrant and Zelda, have I told you you have been fantastic? I'm so happy to be here among you :2thumbsup

Jellby
01-22-2010, 01:20 PM
If p.anything overrules simple p, then obviously I don't need them at all.

It does. More specific selectors override less specific ones, much like "div.firstp p" overrides a bare "p".

I also had some small poems in later stories, which need different handling too. I had them in <div class="poem"> tags, but I guess <p class="poem"> would suffice. What would I need a <div> tag for, then, I wonder... I'll probably find out with experience.

Poems are a nasty dusty beast :D

For simple poems, you don't need much, indeed. But when you start having different indents and long lines (which you'd like to wrap nicely), things start to get complicated. My personal way of dealing with poems is something like:

<div class="poetry">
<div class="stanza">
<p>How doth the little crocodile</p>
<p class="indented">Improve his shining tail,</p>
<p>And pour the waters of the Nile</p>
<p class="indented">On every golden scale!</p>
</div>

<div class="stanza">
<p>How cheerfully he seems to grin,</p>
<p class="indented">How neatly spread his claws,</p>
<p>And welcome little fishes in</p>
<p class="indented">With gently smiling jaws!</p>
</div>
</div>

with:

div.poetry {
margin: 1em 0 1em 2em;
font-style: italic;
}
div.stanza {
margin: 0.5em 0;
page-break-inside: avoid;
}
div.poetry p {
margin: 0;
text-align: left;
padding-left: 5em;
text-indent: -5em;
}
div.poetry p.indented {
margin-left: 1.5em;
}

The combination of margins, paddings, and text-indent allows me to indent the whole poetry block, have additional indent for the "indented" class (I have also "indented2", "indented3", etc. if needed), and let the wrapped portions of a line (if it gets wrapped) be further indented (ideally they would be right-aligned, with a "[" before them, but that's simply not possible with CSS). Sometimes I think I should use <div> instead of <p> in poetry, but <p> is shorter :D

Peter Sorotokin
01-22-2010, 01:56 PM
So Adobe uses the compressed size instead of the actual size of the files to determine pages.... What can I say...

This is all documented in EPUB Best Practices Guide (http://www.adobe.com/devnet/digitalpublishing/):

When page map is not available in the document, Adobe Digital Editions will synthesize a page-map based on the document content. The approach used is the following:
1. Determine a compressed byte length of each resource which is referenced in the spine, subtracting any known encryption overhead (IV size)
2. Assume that there is a page for each 1024 bytes in each resource, rounding up to the nearest whole number of pages for each resource
3. To map page breaks into a resource, use the number of pages for the resource as determined in step 2, count the number of Unicode characters in the resource; distribute synthetic page breaks in the resource evenly between the characters by dividing the number of characters by the number of pages; if the number of characters don’t divide evenly among the pages, round the number of characters per page up and let the last “page” contain less characters than the rest.

It may be "obvious" to you that uncompressed size should be used, but it would not work reliably. Compressed size is better because it is the only reliable number about the resource without expensive decompression and sometimes decryption (and decryption may not even be possible in some cases).

DaleDe
01-22-2010, 02:10 PM
There is a discrepancy between step 2 and 3. In 2 you say bytes and in 3 you say unicode characters. I believe 3 is right but they are not always the same number. I think ADE looks at the coding to figure out what to do and probably uses a formula for UTF-8.

Jellby
01-22-2010, 02:30 PM
This is all documented in EPUB Best Practices Guide (http://www.adobe.com/devnet/digitalpublishing/):

Of course, I had read it some time ago, but it didn't register in my head :D

It may be "obvious" to you that uncompressed size should be used, but it would not work reliably.

What is obvious to me is that page numbers are not reliable anyway ;) Just like for paper books, page numbers are only useful if you refer to a specific particular edition of a book, or if you keep them for your own private use, like estimating how long it will take you to read a book or finish a chapter.

But... doesn't the zip file store (or isn't it "easy" to get) the uncompressed size of the files? I think I usually see the uncompressed size in GUI zip apps, and it doesn't seem like they really uncompress the files. Or maybe is it because I strip all permissions and such when zipping?

There is a discrepancy between step 2 and 3. In 2 you say bytes and in 3 you say unicode characters.

But step 2 refers to the compressed size, and there are no characters there, only bytes. With the compressed size you calculate the number of pages. Then, after you uncompress the resource (file), you can place the pagebreaks by distributing the characters.

DaleDe
01-22-2010, 02:34 PM
Of course, I had read it some time ago, but it didn't register in my head :D

But step 2 refers to the compressed size, and there are no characters there, only bytes. With the compressed size you calculate the number of pages. Then, after you uncompress the resource (file), you can place the pagebreaks by distributing the characters.

How can that work? It you already used step two to figure the total then how can you use the uncompressed to get to the same number. The zipped file includes all of the images as well which are not counted in the page sizes. This needs fixing I think.

Dale

Peter Sorotokin
01-22-2010, 02:50 PM
There is a discrepancy between step 2 and 3. In 2 you say bytes and in 3 you say unicode characters. I believe 3 is right but they are not always the same number. I think ADE looks at the coding to figure out what to do and probably uses a formula for UTF-8.

There is no discrepancy. To determine number of pages allocated to a resource, its compressed size in bytes is used. To place page boundaries inside the resource Unicode characters are used.

Peter Sorotokin
01-22-2010, 02:56 PM
What is obvious to me is that page numbers are not reliable anyway ;) Just like for paper books, page numbers are only useful if you refer to a specific particular edition of a book, or if you keep them for your own private use, like estimating how long it will take you to read a book or finish a chapter.

That's true, of course. And in addition to compression ratio there are many other variables, for instance UTF-8 vs UTF-16 encoding, entity references, DOS/UNIX linebreaking, whitespace.

But... doesn't the zip file store (or isn't it "easy" to get) the uncompressed size of the files?

It does, except when it does not. Sometimes there is just zero there. And if a resource is encrypted, it is not stored anywhere.

I think I usually see the uncompressed size in GUI zip apps, and it doesn't seem like they really uncompress the files. Or maybe is it because I strip all permissions and such when zipping?

It is there most of the times. But some common toools don't put it there (e.g. I think Java Zip implementation sometimes omits it, but I may be mistaken).

Jellby
01-23-2010, 03:35 AM
How can that work? It you already used step two to figure the total then how can you use the uncompressed to get to the same number.

Simple: by forcing it ;)

First, with the compressed size, you calculate the number of pages. Now the number of pages is fixed, and you place the pagebreaks in the uncompressed file so that you get the fixed number. There's no specification on the page length here, they'll be as long as needed to get the number of pages you calculated before.

DaleDe
01-23-2010, 10:33 AM
Simple: by forcing it ;)

First, with the compressed size, you calculate the number of pages. Now the number of pages is fixed, and you place the pagebreaks in the uncompressed file so that you get the fixed number. There's no specification on the page length here, they'll be as long as needed to get the number of pages you calculated before.

Thanks for that. So they don't really use the count of actual characters at all. Only the byte count of the compressed file. Seems like they are fooling themselves.

Dale

zelda_pinwheel
01-23-2010, 10:37 AM
thanks peter for the explanation. it's great to understand exactly how it works.

omk3
01-23-2010, 02:53 PM
All very intriguing! :)
I still don't understand why any book, regardless of level of compression, would start at page two instead of one.
But this thread has been more revealing than I ever expected it to, and I certainly learned many new and interesting things! :D

DaleDe
01-23-2010, 07:15 PM
All very intriguing! :)
I still don't understand why any book, regardless of level of compression, would start at page two instead of one.
But this thread has been more revealing than I ever expected it to, and I certainly learned many new and interesting things! :D

The answer has to do with the way it thinks. The algorithm says the image doesn't fit on page one so I will force it to page 2. This is pretty typical treatment of images, but wasn't intended to happen on the very first page.

Dale

ninni
01-30-2010, 08:44 AM
Hi!

Same thing happened to me. I think you need to save the svg as a separate file and then link it to your xhtml file with an object tag. Like this:

<object data="nameofthefile.svg" type="image/svg+xml" width="100%" >
<img src="nameoffallbackimage.png" alt="something" />
</object>

Works for me at least!

Vengadesan
11-02-2010, 05:44 PM
I've opened my mobi file in Kindle and it works fine. But the book does not start with cover page in Kinde.

Thanks for your quick comments.