MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Plugins (https://www.mobileread.com/forums/forumdisplay.php?f=268)
-   -   [KDPValidator] Validate epubs for KDP upload only (https://www.mobileread.com/forums/showthread.php?t=310046)

slowsmile 08-19-2018 09:15 AM

[KDPValidator] Validate epubs for KDP upload only
 
1 Attachment(s)
Validate epubs for Amazon Kindle upload only


Requirements
Plugin Type: Validation
MIT Licence(OSI)
Minimum Sigil requirement: v0.9.3 or higher
Python Requirements: Python 3.4+ (Bundled or External)
OS Requirements: Windows, Linux or OSX
*** Tested on Windows 7, 8 & 10 only ***
Current Version: "0.1.3"

Installation

* Select Manage Plugins from the Plugins menu. In the dialog box, select either the Bundled Python or the External Python(Python 3.4+ should be installed on your computer to run this plugin externally).

* Click Add Plugin and select KDPValidator_vXXX.zip. This will load and install the plugin into Sigil, which you can then run by selecting Plugins > Validation > KDPValidator

Description
This plugin checks and validates epubs for Amazon Kindle upload only and should be used for epub 2.0 ebooks that are going to be viewed on KF8 or KF7 devices. This plugin gives the user a last quick check before upload to KDP.

The plugin checks for the following:

* Unsupported html tags.
* Usupported html attributes.
* Unsupported style attributes in the css, html <styles> and html inline styling(very basic check).
* Cover pages not allowed warning.(added in v0.1.2)
* Bad ebook image format.
* Bad internal links
* SVG image warning
* Flags ebook images smaller than max page size that are not dual formatted
* Non-use of heading styles.
* Inappropriate use of absolute values in the css.
* Missing TOC file.
* Logical TOC does not contain the same toc items as the epub TOC file.
* Missing or too many opf guide references.
* Look Inside formatting issues.

User Suppressed Warnings(added in v0.1.2)
The plugin user can now turn off or suppress any plugin warning by accessing the KDPValidator.json file and changing any of the listed warning values to "false". Initial default warning values are all set to "true".

Caveat
Be aware that if you've used an epub converter that uses indexed styling -- such as calibre23, scrivener15 etc -- then only a few of the plugin's css checks can be run. However, all further checks on the epub's content.opf, toc.ncx and xhtml files should run without any problems.

Plugin Run

* First load your epub into Sigil, run Epubcheck and ensure there are no errors.

* Run this plugin. All errors/warnings will be displayed in the validation pane.

Changes

Spoiler:

v0.1.3
-- Fixed a bug in cover page detection. The plugin now detects the epub cover page by searching for the presence of the guide cover reference in the opf.Thanks to DiapDealer.
v0.1.2
-- The plugin will now give a warning if there is a cover image file in the epub. There should be no cover image on upload to KDP as per the Kindle Guidelines.Thanks to st_albert and Hitch.
-- The user can now turn off or suppress any plugin warning by accessing the KDPValidator.json file and changing any of the listed warning values to "false". The initial default warning values are all set to "true". Thanks to DiapDealer.
v0.1.1
-- Changed from using standard cover file names to using the actual cover file name derived from the guide cover href for searches in the cover file checks. Thanks to KevinH for the suggested change.
.
v0.1.0
-- Initial release

odamizu 08-20-2018 01:58 AM

Quote:

Originally Posted by slowsmile (Post 3738421)
... should be used for epub 2.0 ebooks ...

Intriguing new plug-in, but why is it designed for epub 2.0 rather than 3.0 or both?

st_albert 08-20-2018 12:06 PM

I've just tried it on one of our production files, and it looks like it's going to be helpful.

One question, though. I'm getting the "[GENERAL WARNING]: Cover image file not found" message. Where does it expect the image file to be? It's in the Images directory, and is correctly pointed to in the "cover" guide entry, and in the manifest and spine, and more importantly looks fine at Amazon, including "Look Inside" so I'm guessing Amazon is OK with it.

Other warnings about tweaks to the body style and the guide section were reasonable.

Thanks for your work!

Albert

slowsmile 08-20-2018 12:12 PM

@st_albert...The plugin searches for cover.xhtml or titlepage.xhtml(if you converted to epub using Calibre). And it also searches for cover.html and titlepage.html. Only these cover file names will work with this plugin. So if you use other cover file names in your epub, besides the ones I've mentioned, then you will get the Cover file not found error.

I've also just tested the plugin again using several different epubs that use cover.xhtml as the cover file name and I didn't get the "Cover file not found" error with any of them.

st_albert 08-20-2018 12:17 PM

Quote:

Originally Posted by slowsmile (Post 3738800)
@st_albert...The standard name normally used for the cover file for Sigil is cover.xhtml or titlepage.xhtml(if you converted to epub using Calibre). And cover.html and titlepage.html are also allowed. Only these cover file names will work with this plugin.

Ahh, My names are derived from the ISBN of the book. So, I can safely ignore this message then. No problem.

Albert

KevinH 08-20-2018 02:02 PM

perhaps parsing the opf guide and/or the nav landmarks to get the name of the xhtml file that contains the cover image might make things more general

st_albert 08-20-2018 02:17 PM

Quote:

Originally Posted by KevinH (Post 3738841)
perhaps parsing the opf guide and/or the nav landmarks to get the name of the xhtml file that contains the cover image might make things more general

Actually, in the epubs that I intend to convert to mobi with kindlegen, I don't have an xhtml cover page at all. This is because, back in the day, that would cause there to be "two" cover images in the mobi file -- one the actual cover, and the other the first page of the interior of the book. I think kindlegen is smarter than that now, and ignores the cover .xhtml file.

But why tempt fate? The guide element for "cover" points to the Image/cover.jpg file itself. Kindlegen likes this, but it will cause epubcheck to bark.

I guess one could parse the cover guide element and then check to see if the .jpg file is where the guide says it is, and if so consider that the cover image to check for the other properties (size, whatever else is checked).

Albert

Doitsu 08-20-2018 02:45 PM

Quote:

Originally Posted by st_albert (Post 3738850)
But why tempt fate? The guide element for "cover" points to the Image/cover.jpg file itself. Kindlegen likes this, but it will cause epubcheck to bark.

Amazon actually recommends using a metadata entry with a name="cover" attribute to mark the cover image.

For example:

Code:

<meta content="cover.jpg" name="cover" />
(The content attribute value is the cover image manifest id.)

If you create epub3 books, you'll need to mark cover images in the OPF manifest section with a properties="cover-image" attribute.

For more information see section 4.2 of the Kindle Publishing Guidelines.

IIRC, Sigil will automatically add the proper attribute(s) if you use the cover image semantics option.

slowsmile 08-20-2018 08:01 PM

@odamizu...I didn't include checks for epub 3 for several reasons. First, standard epub 3 is essentially html5 snd can be checked using Epubcheck. For Kindle epub 3 you would have to include checks for KF8 and KFX which is Kindle's own brand of fixed format. I also found that the differences between KFX, KF8 and standard epub 3 were badly documented and unreliable. That's really why my plugin doesn't check Kindle epub 3 ebooks.

st_albert 08-21-2018 12:17 PM

Quote:

Originally Posted by Doitsu (Post 3738855)
IIRC, Sigil will automatically add the proper attribute(s) if you use the cover image semantics option.

It does, and I use it. In my cases, the manifest ID is an x prepended to the file name (perhaps because my filenames start with a number?). So you could go that route, but if you needed the path to the image, you'd need to look up the ID in the manifest or the guide element.

slowsmile 08-22-2018 06:40 AM

Update: Changes in v0.1.1:

* Changed from using standard cover file names to using the proper cover name derived from the guide cover href for searches in the cover file checks. With thanks to KevinH for the suggested change.

st_albert 08-22-2018 11:34 AM

Quote:

Originally Posted by slowsmile (Post 3739449)
Update: Changes in v0.1.1:

* Changed from using standard cover file names to using the proper cover name derived from the guide cover href for searches in the cover file checks. With thanks to KevinH for the suggested change.

OK. Definitely on the right track... I no longer get the "Cover image file not found" error, but I now get
Code:

[ERROR]: Bad cover position.  The cover image file should always be the first file in Sigil's Book Browser file list.
which is misleading, because according to the KDP publishing guidelines, version 2018.2, section 4.2, pp 14-15, there should not be a cover.xhtml file at all:
Quote:

Do not add an HTML cover page to the content in addition to the cover image. This may result in the cover appearing twice in the book or cause the book to fail conversion.
(my emphasis)

I also mentioned this in a previous post (#7) in this thread.

Edited to add:
My apologies. It seems I may have mislead you when I said I was using
Code:

  <guide>
    <reference type="cover" title="Cover" href="Images/9781606192863.jpg"/>
        ...
  </guide>

to specify the location of the cover file. That method used to work by itself, but It is not the preferred KDP method. Ever since Sigil started adding cover metadata to the opf, I have ALSO been using
Code:

  <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
        ...
    <meta name="cover" content="x9781606192863.jpg" />
  </metadata>

  <manifest>
        ...
    <item id="x9781606192863.jpg" href="Images/9781606192863.jpg" media-type="image/jpeg"/>
  </manifest>

as well.

Since the latter is what Amazon requires (cf. p. 15 of the Guidelines), I would recommend you check for it instead of the guide reference.

Sorry about that! :o

Albert

slowsmile 08-22-2018 10:35 PM

@st_albert...You've raised several issues and I have to say that I don't agree with you on most of your suggestions. This is going to take some explaining, so bear with me.

Quote:

Code:
[ERROR]: Bad cover position. The cover image file should always be the first file in Sigil's Book Browser file list.

which is misleading, because according to the KDP publishing guidelines, version 2018.2, section 4.2, pp 14-15, there should not be a cover.xhtml file at all:
Quote:
Do not add an HTML cover page to the content in addition to the cover image. This may result in the cover appearing twice in the book or cause the book to fail conversion.

First your suggestion that you should never put a cover image into epub for KDP uploads. Well what you say might be true for epub vendors like iBooks, Nook etc but it is not true for epub uploads to KDP. You must also never upload a Word doc with cover image to KDP otherwise you will get a double cover in the ebook. But whenever you upload an epub to KDP -- during the conversion process -- it's well known that KDP will always either add or replace any epub cover image with the Amazon Kindle product image which is loaded separately. So, for KDP uploads only, there is absolutely no need to remove the epub cover image because it will always be replaced by the product image on conversion to mobi. Also if you are testing your epub before upload it also makes sense to test it with the cover image in place.

Now to all the reasons why I still consider using the opf guide cover reference as the best way to get the cover file name.

Your suggestion about using the metadata cover reference would fail because it does not take into account all instances of epubs being produced from different doc-to-epub converters. To illustrate this, here is the metadata cover ref I get in the opf metadata when I load an epub created by Scrivener into Sigil:

<meta content="cover-image" name="cover" />

How can I get the cover file name from that line? What I really want is the href, which is not there. Also, you cannot rely on 'name="cover"' being the name of the cover file. You really need the href to be sure.

You also cannot use the landmark nav ref in the toc.ncx due to the fact that not every indie author puts a cover reference in the Logical TOC for their KDP ebooks. I've seen plenty of ebooks with a Logical TOC that does not have a cover entry. So this method is also not a reliable way to get the cover file name.

The only two searchable references that you should be able to use to get the cover file name are the cover "id" in the manifest or the cover "type" in the opf guide because they both contain hrefs.

Manifest cover ref:
<item id="cover" href="Text/cover.xhtml" media-type="application/xhtml+xml"/>

OPF guide cover ref:
<reference type="cover" title="Cover" href="Text/cover.xhtml"/>

My own preference is to use the opf guide ref because 'type="cover"' is usually the same across all doc-to-epub converters whereas I'm not so sure about how the manifest cover id name might vary across different doc-to-epub converter outputs.

Doitsu 08-23-2018 03:24 AM

Quote:

Originally Posted by slowsmile (Post 3739709)
To illustrate this, here is the metadata cover ref I get in the opf metadata when I load an epub created by Scrivener into Sigil:

<meta content="cover-image" name="cover" />

How can I get the cover file name from that line?

That's actually fairly easy, because you can use bk.id_to_href() to get the href. For example:

Code:

metadata_soup = BeautifulSoup(bk.getmetadataxml(), 'lxml')
cover_image = metadata_soup.find('meta', {'name' : 'cover'})
if cover_image:
    cover_id = cover_image['content']
    cover_href = bk.id_to_href(cover_id)

For more information, see my KindleGen plugin which'll check for the presence of all recommended guide/landmarks items and cover image identifiers in epub2 and epub3 books.

slowsmile 08-23-2018 03:54 AM

@Doitsu...Yes, I use bk.id_to_href() all over the place in my plugin to get file names.

But I tried an experiment. In a valid epub, I changed the cover file refs from "cover" to "noddy" in the metadata, manifest and spine. And when I ran Epubcheck the epub passed without any problems at all which surprised me. The point I'm making here is that those cover ids can really be anything you like -- they don't have to be "cover" and as long as those cover ids -- any cover id from any doc-to-epub converter -- are implemented in the metadata, manifest and spine then your epub will be valid. This also means that epub converters can indeed use other cover ids besides "cover" in the epub if they want. Like I said, this surprised me because searching for the metadata cover ref using "cover" does not seem to satisfy the maxim "in all possible test instances".


All times are GMT -4. The time now is 08:43 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.