yep... Note that these advices apply to the HTML file you use to create the Mobi; no-one (so far) is suggesting a validator that understands the actual Mobi file.
Basically, MobiPocket didn't bother with this issue, Amazon didn't fix it, and made it worse by continuing to develop the software without bothering to update the documentation.
You need to make your own checklist. Don't rely on documentation to tell you what works; check each feature on your target device... which is going to be the Kindle :-).
I can show you what my list would look like. There's one thing I don't think anyone has mentioned: CSS validation (if you're using any CSS).
http://jigsaw.w3.org/css-validator/
Other checklist suggestions:
- list each individual HTML tag and CSS style you're using. Test that all of them actually work as intended on the Kindle. If you've only got a Kindle desktop app, remember that it may work slightly differently. I think they left the "align=poetry" code in the desktop app from Mobipocket days, but they didn't implement it in the Kindle. I can't think of anything else you'd be likely to hit. (At least not at the markup level. Checking whether images are appropriate for the target device is not a traditional "validation" step, but you may need to that anyway).
- make sure it's well-formed XML. I use "xmllint" on linux. Or you could install HTML tidy and run "tidy -errors -xml chapter.html" for each html file. I suspect you can get away with missing end-tags etc, so it's not critical, but you should want to have a nice clean file. This may help you avoid making mistakes as you edit the file.
- try validating it as XHTML transitional - but it will complain about mobi-specific tags, and possibly other "false positives", due to the presentational html you have to use. Still, it may pick up on some stuff. E.g. if you mis-spell "align" as "aling".
- I would add one requirement of XHTML Strict - don't use bare text; almost everything should be nested inside a P tag. E.g. if you BLOCKQUOTE a single paragraph, you should have a P inside the BLOCKQUOTE. Obvious exception: headings. But you're unlikely to pass XHTML Strict, so you may not be able to validate this automatically without being swamped in "false positives".
- You should also be able to check for "special characters" being used in your book. Same game as the tags really - make a list, then check they display as intended. On unix, you can use
cat chapter1.html chapter2.html | sed -e 's/./.\n/g' | sort | uniq -c > characters.txt
or use the character analysis in the "word frequency" dialog of GuiGuts. (And check - I think it says this in the title bar - ctrl+x or ctrl+s should let you save the analysis to a text file, which may be easier to use).
In fact, you could probably just enclose that textual list of characters in a <pre> (that works, right?)... or whatever you have to do to make it into workable HTML... and there's your test document for viewing on the Kindle.
Won't quite help for space characters, but at most you'll have four of them to account for. (newline/carriage return, normal space, non-breaking space).
That's assuming you don't use character references (except for & / < / > , which you _have_ to escape that way). In that case, GuiGuts won't help much, but you can still use Linux
grep -o '&[^;]*;' | sort | uniq -c
There are online tools that will help you do this (search for a character _histogram_ or _frequency analysis_). Basically, if you can't check what characters you're using, you've no business publishing long-form texts. (Except for free releases which are effectively works-in-progress or requests-for-feedback).
- compare how it works in the Kindle to a high quality commercial book. Do you have a working table of contents; does it look good and make sense? Does the cover image work well? Does your metadata - title and author name - show correctly when you're selecting the book?