Quote:
Originally Posted by turygo
Is there any step I did wrong?
|
No, it looks like my tests were incomplete. Three hours and a custom append-only update script later, I know a lot more.
I can now state with confidence that the Kindle metadata parser is just a mess. If a
non-linearized PDF file was
created with metadata in a
self-contained Info dictionary that is not part of a compressed object stream,
then the author will show up on the main listing page, and the title and author will show up on the detail page. I've only gotten appends to work with Adobe Acrobat Pro 9.x, and only when starting with a v1.3 file, and I'll be damned if I can figure out why I can't replicate that behavior in my script.
A good, working document will have an Info dictionary that looks like this: "<</Title(My Book) /Author(Gary Stu)>>".
The MacOS X built-in print-to-PDF saves as v1.3, but splits the metadata across multiple objects, like this: "17 0 obj (My Book) endobj 18 0 obj (Gary Stu) endobj <</Title 17 0 R /Author 18 0 R>>". This is invisible to a Kindle.
I hadn't hacked PDF files by hand in 17 years, but I was sufficiently annoyed by this mess that I wrote a tool that correctly implements the incremental update spec (PDF Reference v1.7, section 3.4.3-3.4.5 with examples in appendix G.6) to add self-contained metadata objects. Acrobat, MacOS X Preview, and pdftk all see my updated metadata, but the Kindle does not.
Unless I save as v1.3, add the metadata with my script, and then pass the file through pdftk to convert my update to a rewrite! At this point I declared both victory and failure.
My metadata-appending Perl script is available
here for anyone who finds it useful.
-j