PDF Authors

R22 · 06-24-2011, 01:19 AM

I am trying to figure out a way to edit what the Kindle 3 displays as the "Author" for PDF files that I have emailed to my Kindle. PDF editing programs, such as PDF-Xchange Viewer allow you to modify what is described as the "Author Metadata Field". HOWEVER, this is NOT what the Kindle displays as the Author. There is another field that is NOT listed under the Metadata that appears to be read by Kindle.

Allow me to elaborate. By using a Binary File Editor I was able to locate the "Properties | Author Metadata Field". In every PDF file there is a section of data that LOOKS like XML markup text entitled "x:xmpmeta". I must assume that is the 'usual metadata' section for the PDF file. Within this is a section entitled "dc:creator", and within this is "rdf:Seq", and inside that is "rdf:li" -- which holds letter-for-letter what is displayed in the "Properties | Author" field when you open a PDF. This is the usual Author Metadata Field. Of this I am very certain!

The problem occurs because the Kindle has decided to NOT use the String that appears there to display as the "Author". I do not understand why, I just know it to be true!

Instead, Kindle takes data that is NOT enclosed inside the usual metadata section -- that is, NOT nested in "x:xmpmeta". Outside of the expected XML markup metadata is another set of fields that list: Author, CreationDate, Creator (this time NOT referring to the Author but instead program that created the PDF file), ModDate, Producer, and Title. I have no idea what to call this section, but the Kindle seems to grab the string from the Author field here and it displays that as the "Author".

Now, I am at a loss to find a program that allows me to INSERT this secondary yet-unnamed "collection of fields" into a PDF file. If I use a Binary file editor and brute force stick that inside the PDF file, the file becomes "corrupted" per Adobe Acrobat. Now, the file MAY still work on the Kindle (yet to be seen -- until I get home), but it is clear the Acrobat does not like that (see my link below).

Has anyone figured out this mess before me?? If so, I congratulate you. If not, someone should -- or we should push Amazon to use the USUAL METADATA AUTHOR field to display as the Author!!!! I would love to be able to Edit the Author field that the Kindle displays for PDF files. There must be a way! Thanks for reading.

For more information into my trials and failures:
http://www.dslreports.com/forum/r260...Sums-#26013953

Alissa · 06-24-2011, 08:18 AM

The XML data block you found is called a metadata stream. The data block you called "another set of fields" is called document information dictionary. Both are documented in PDF spec. (E.g., this.)

Metadata stream is richer and more flexible than document information dictionary, and Adobe recommends use of former, but the latter spec. is older and easier to parse, so many PDF handling programs still relys on the document information dictionary. I'm not surprised to know Kindle only support the latter.

IMHO, the problem is not in Kindle but in the program you used to edit the PDF (i.e., PDF-Xchange). It should update both author entries in metadata stream and document information dictionary simultaneously. Adobe Acrobat (an expensive program; not a free-of-charge Adobe Reader) does so.

R22 · 06-25-2011, 01:35 PM

Excellent. I am not an educated PDF user and I have never looked at the specs. Thank you for the extremely helpful information! I appreciate your time.

Elfwreck · 06-25-2011, 02:20 PM

The free program BeCyPDFMetaEdit will let you edit PDF metadata. (Obviously, the designers never expected it to be used by the general public, or they'd've named it something with less than 9 syllables.)

That should fix the fields the Kindle uses; it works on the Sony's PDF meta-fields.

anamardoll · 06-27-2011, 10:14 AM

Quote:

Originally Posted by Elfwreck

The free program BeCyPDFMetaEdit will let you edit PDF metadata. (Obviously, the designers never expected it to be used by the general public, or they'd've named it something with less than 9 syllables.)

That should fix the fields the Kindle uses; it works on the Sony's PDF meta-fields.

I'm also fond of "Quick PDF Tools" and they let you edit Author/Title/Etc., in a way that I think the Kindle will recognize. (Works for my PB, anyway.

)

R22 · 06-28-2011, 02:46 PM

Thanks to all. I will give them a try and see what works!

R22 · 07-01-2011, 11:40 PM

OK, Because it seems to have a small foot print and be useful mostly for only MetaData editing, I chose to try the BeCyPDFMetaEdit program. Is it possible to make a worse name?

It seems to do the trick, with a little caveat. There is a tabbed interface. The first tab is "MetaData" and the Second tab is "MetaData XMP". On the surface the program appears to be calling the "Document Information Dictionary" the MetaData and the "MetaData Stream" is the MetaData XMP. This is not exactly how I would have named these tabs -- isn't the terminology confusing enough??

You can edit the MetaData (eh, Document Information Dictionary) on the first tab, and this does work, but see below for details. The second tab is not nearly as helpful. Instead of allowing you to edit even more MetaData (since the MetaData Stream is the more robust of the two data fields) , the only option you are given on this tab is to DELETE the XMP MetaData! That seems a little weird for a program that is supposed to be a MetaData Editor...

Regardless, I ignored the second tab and worked mostly with the first tab. This is another place where it was a slight bit weird. When you first open the PDF file, the fields are already filled in -- even if the file has no Document Information Dictionary (DID). OK, no biggy -- the program extracted the data from the MetaData Stream (MDS). I got it.

Since the Author field now APPEARS to be filled in correctly one might get the impression that if you SAVED the file now the two MetaData fields would be in sync. NOPE. If you save the file now, no DID is created.

But, if you delete the Author field and Save the file, and then re-Open the file and manually re-enter the Author data, and then Save the file once more -- then and only then will an appropriate DID be created in the PDF file. Seems like in-elegant solution.

However, to short cut this a little, you can use the Save As function instead of Save. If you select Save As just after you open the PDF file, the file is saved with a "_1" appended to the file name -- and surprisingly enough, the appropriate DID is created!

This does not apparently mess with the XML MetaData Stream (since I did not check the box on the second tab) and creates an Author field that Kindle CAN read!

Thanks to all.

R22 · 07-02-2011, 01:13 PM

Urrgggh! I spoke too soon! I had not yet looked at the files on my Kindle! The BeCyPDFMetaEdit program DOES NOT edit the MetaData in a manner that Kindle finds acceptable!

Yes, it does create and edit the "Document Information Dictionary" - but not in a structure that Kindle likes or can use!

BeCyPDFMetaEdit creates the DID exactly in this manner:

<<
/Author (California DMV)
/CreationDate (D:20110103090206-07'00')
/Creator (Adobe InDesign CS4 \(6.0.4\))
/ModDate (D:20110701201212-08'00')...

However, Kindle seems to be only able to Parse the DID if it is created like this:

<</Author(California DMV)/CreationDate(D:20110113094150-07'00')/Creator(Adobe InDesign CS4 \(6.0.4\))/ModDate(D:20110624002212-07'00')...

That is to say, the Kindle will only parse the data if it is one continuous line of data, but BeCyPDFMetaEdit inserts line breaks between each of the pairs! This makes the DID unreadable for Kindle!

Back to the drawing board. I'll see if the Binary Editor can fix this by removing the line breaks -- although I doubt it, as it was unable to create a viable PDF file if I manually inserted my own DID...

I'll get back to you in case someone else decided this is of interest!

_____________________________________

Answer is NO! The binary editor will remove the extra line breaks and spaces -- but then the file becomes corrupt and cannot be opened by any program -- not even BeCyPDFMetaEdit! So that is not a solution.

Next I will try PDF Tools and see if it can create/edit the DID in a manner that is acceptable to Kindle...

R22 · 07-02-2011, 07:28 PM

Urrgggh! AGAIN. It is like the fricking nightmare out of the American Werewolf in London.

PDF Tools does the EXACT SAME THING that BeCyPDFMetaEdit does. If you use PDF Tools to even LOOK at the Properties, it creates a Document Information Dictionary data field that has the same structure as the one created by BeCyPDFMetaEdit -- that is, with the line breaks and the extra spaces.

The KINDLE does not like this structure! It cannot read it. It will only parse the DID if it is a "run on sentence" with no spaces and no line breaks.

There must be a program that creates a 'legacy" DID structure. It appears that Adobe Acrobat DOES create the correct structure, but I would love to get by NOT having to buy that!

Any other options? Thanks.

Jeff L · 07-03-2011, 06:44 AM

I use PDFinfo.

http://www.bureausoft.com/download.html

R22 · 07-03-2011, 10:50 AM

Jeff L - Thanks! Well, that seems to work -- but, as always, I remain a bit confused! And now I think that MAYBE the BeCyPDFMetaEdit program will work too...

When I opened a 'virgin' PDF file that did not have any Author MetaData, PDFInfo seems to ONLY create the Document Information Dictionary (DID). It did not create the XML MetaData Stream (MDS). Strangely, it again creates the same pattern that the above programs create -- that is to say, it does NOT create the linear, run-on version of the DID. It creates the version with the line breaks separating each pair from one another, and the spaces before the data entry (see above). Of note, Acrobat itself creates the linear run-on version!

But for some reason, after using PDFInfo, the Kindle IS able to parse and display the Author correctly! I sent myself two files and both worked perfectly.

So... I am at a loss to figure out completely WHY or HOW this works. This is how it seems to be functioning for me:
________________

1) If you have NO MetaData Stream (MDS) field in your document, then a multiple lined DID (with line breaks and spaces) will be correctly parsed by Kindle and it will display the Author.

2) If you have a MDS field, then Kindle will only parse a linear, run-on DID. Otherwise it will not display the Author.
_________________

I know this sounds weird and likely is more than anyone ever wants to know, but this has been driving me crazy! So I decided I had to figure it out as best I could!

So.... I my next test is to use the BeCyPDFMetaEdit program to Edit the Author -- but this time I will check the box to DELETE the "XMP MetaData". That should create a document that should the criteria #1 listed above and should allow Kindle to display the Author.

I'll get back to you. I know you can't wait and are riveted to your computer screens!

07-01-2011, 11:40 PM	#7
R22 Member Posts: 12 Karma: 1536 Join Date: Jun 2011 Device: Kindle	OK, Because it seems to have a small foot print and be useful mostly for only MetaData editing, I chose to try the BeCyPDFMetaEdit program. Is it possible to make a worse name? It seems to do the trick, with a little caveat. There is a tabbed interface. The first tab is "MetaData" and the Second tab is "MetaData XMP". On the surface the program appears to be calling the "Document Information Dictionary" the MetaData and the "MetaData Stream" is the MetaData XMP. This is not exactly how I would have named these tabs -- isn't the terminology confusing enough?? You can edit the MetaData (eh, Document Information Dictionary) on the first tab, and this does work, but see below for details. The second tab is not nearly as helpful. Instead of allowing you to edit even more MetaData (since the MetaData Stream is the more robust of the two data fields) , the only option you are given on this tab is to DELETE the XMP MetaData! That seems a little weird for a program that is supposed to be a MetaData Editor... Regardless, I ignored the second tab and worked mostly with the first tab. This is another place where it was a slight bit weird. When you first open the PDF file, the fields are already filled in -- even if the file has no Document Information Dictionary (DID). OK, no biggy -- the program extracted the data from the MetaData Stream (MDS). I got it. Since the Author field now APPEARS to be filled in correctly one might get the impression that if you SAVED the file now the two MetaData fields would be in sync. NOPE. If you save the file now, no DID is created. But, if you delete the Author field and Save the file, and then re-Open the file and manually re-enter the Author data, and then Save the file once more -- then and only then will an appropriate DID be created in the PDF file. Seems like in-elegant solution. However, to short cut this a little, you can use the Save As function instead of Save. If you select Save As just after you open the PDF file, the file is saved with a "_1" appended to the file name -- and surprisingly enough, the appropriate DID is created! This does not apparently mess with the XML MetaData Stream (since I did not check the box on the second tab) and creates an Author field that Kindle CAN read! Thanks to all. Last edited by R22; 07-02-2011 at 01:16 PM.

07-02-2011, 01:13 PM	#8
R22 Member Posts: 12 Karma: 1536 Join Date: Jun 2011 Device: Kindle	Urrgggh! I spoke too soon! I had not yet looked at the files on my Kindle! The BeCyPDFMetaEdit program DOES NOT edit the MetaData in a manner that Kindle finds acceptable! Yes, it does create and edit the "Document Information Dictionary" - but not in a structure that Kindle likes or can use! BeCyPDFMetaEdit creates the DID exactly in this manner: << /Author (California DMV) /CreationDate (D:20110103090206-07'00') /Creator (Adobe InDesign CS4 \(6.0.4\)) /ModDate (D:20110701201212-08'00')... However, Kindle seems to be only able to Parse the DID if it is created like this: <</Author(California DMV)/CreationDate(D:20110113094150-07'00')/Creator(Adobe InDesign CS4 \(6.0.4\))/ModDate(D:20110624002212-07'00')... That is to say, the Kindle will only parse the data if it is one continuous line of data, but BeCyPDFMetaEdit inserts line breaks between each of the pairs! This makes the DID unreadable for Kindle! Back to the drawing board. I'll see if the Binary Editor can fix this by removing the line breaks -- although I doubt it, as it was unable to create a viable PDF file if I manually inserted my own DID... I'll get back to you in case someone else decided this is of interest! _____________________________________ Answer is NO! The binary editor will remove the extra line breaks and spaces -- but then the file becomes corrupt and cannot be opened by any program -- not even BeCyPDFMetaEdit! So that is not a solution. Next I will try PDF Tools and see if it can create/edit the DID in a manner that is acceptable to Kindle... Last edited by R22; 07-02-2011 at 01:30 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
PDF TO EPUB Authors Promotion - first 20,000 registrations	OrcaBlue	Deals and Resources (No Self-Promotion or Affiliate Links)	23	01-06-2011 11:15 AM
Classic PDF titles and authors on nook?	slothrop	Barnes & Noble NOOK	2	12-09-2009 09:23 PM
Short Fiction Authors, Various: Stories by Foreign Authors: Polish, Greek, Belgian. v1, 20 Feb 2008	nrapallo	IMP Books (offline)	0	02-22-2008 12:45 AM
Short Fiction Authors, Various: Stories by Foreign Authors: Polish, Greek, Belgian. v1, 20 Feb 2008	Madam Broshkina	Kindle Books (offline)	0	02-20-2008 08:41 PM
Short Fiction Authors, Various: Stories by Foreign Authors: Polish, Greek, Belgian. v1, 20 Feb 2008	Madam Broshkina	BBeB/LRF Books (offline)	0	02-20-2008 08:40 PM

06-24-2011, 01:19 AM	#1
R22 Member Posts: 12 Karma: 1536 Join Date: Jun 2011 Device: Kindle	PDF Authors I am trying to figure out a way to edit what the Kindle 3 displays as the "Author" for PDF files that I have emailed to my Kindle. PDF editing programs, such as PDF-Xchange Viewer allow you to modify what is described as the "Author Metadata Field". HOWEVER, this is NOT what the Kindle displays as the Author. There is another field that is NOT listed under the Metadata that appears to be read by Kindle. Allow me to elaborate. By using a Binary File Editor I was able to locate the "Properties \| Author Metadata Field". In every PDF file there is a section of data that LOOKS like XML markup text entitled "x:xmpmeta". I must assume that is the 'usual metadata' section for the PDF file. Within this is a section entitled "dc:creator", and within this is "rdf:Seq", and inside that is "rdf:li" -- which holds letter-for-letter what is displayed in the "Properties \| Author" field when you open a PDF. This is the usual Author Metadata Field. Of this I am very certain! The problem occurs because the Kindle has decided to NOT use the String that appears there to display as the "Author". I do not understand why, I just know it to be true! Instead, Kindle takes data that is NOT enclosed inside the usual metadata section -- that is, NOT nested in "x:xmpmeta". Outside of the expected XML markup metadata is another set of fields that list: Author, CreationDate, Creator (this time NOT referring to the Author but instead program that created the PDF file), ModDate, Producer, and Title. I have no idea what to call this section, but the Kindle seems to grab the string from the Author field here and it displays that as the "Author". Now, I am at a loss to find a program that allows me to INSERT this secondary yet-unnamed "collection of fields" into a PDF file. If I use a Binary file editor and brute force stick that inside the PDF file, the file becomes "corrupted" per Adobe Acrobat. Now, the file MAY still work on the Kindle (yet to be seen -- until I get home), but it is clear the Acrobat does not like that (see my link below). Has anyone figured out this mess before me?? If so, I congratulate you. If not, someone should -- or we should push Amazon to use the USUAL METADATA AUTHOR field to display as the Author!!!! I would love to be able to Edit the Author field that the Kindle displays for PDF files. There must be a way! Thanks for reading. For more information into my trials and failures: http://www.dslreports.com/forum/r260...Sums-#26013953 Last edited by R22; 06-24-2011 at 01:27 AM.

06-24-2011, 08:18 AM	#2
Alissa Connoisseur Posts: 80 Karma: 716730 Join Date: Apr 2011 Location: Tokyo Device: Kindle 3 + Story HD + Kobo Touch	The XML data block you found is called a metadata stream. The data block you called "another set of fields" is called document information dictionary. Both are documented in PDF spec. (E.g., this.) Metadata stream is richer and more flexible than document information dictionary, and Adobe recommends use of former, but the latter spec. is older and easier to parse, so many PDF handling programs still relys on the document information dictionary. I'm not surprised to know Kindle only support the latter. IMHO, the problem is not in Kindle but in the program you used to edit the PDF (i.e., PDF-Xchange). It should update both author entries in metadata stream and document information dictionary simultaneously. Adobe Acrobat (an expensive program; not a free-of-charge Adobe Reader) does so. Last edited by Alissa; 06-25-2011 at 10:15 AM.

06-25-2011, 01:35 PM	#3
R22 Member Posts: 12 Karma: 1536 Join Date: Jun 2011 Device: Kindle	Excellent. I am not an educated PDF user and I have never looked at the specs. Thank you for the extremely helpful information! I appreciate your time.

06-25-2011, 02:20 PM	#4
Elfwreck Grand Sorcerer Posts: 5,185 Karma: 25133758 Join Date: Nov 2008 Location: SF Bay Area, California, USA Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)	The free program BeCyPDFMetaEdit will let you edit PDF metadata. (Obviously, the designers never expected it to be used by the general public, or they'd've named it something with less than 9 syllables.) That should fix the fields the Kindle uses; it works on the Sony's PDF meta-fields.

06-28-2011, 02:46 PM	#6
R22 Member Posts: 12 Karma: 1536 Join Date: Jun 2011 Device: Kindle	Thanks to all. I will give them a try and see what works!

07-02-2011, 07:28 PM	#9
R22 Member Posts: 12 Karma: 1536 Join Date: Jun 2011 Device: Kindle	Urrgggh! AGAIN. It is like the fricking nightmare out of the American Werewolf in London. PDF Tools does the EXACT SAME THING that BeCyPDFMetaEdit does. If you use PDF Tools to even LOOK at the Properties, it creates a Document Information Dictionary data field that has the same structure as the one created by BeCyPDFMetaEdit -- that is, with the line breaks and the extra spaces. The KINDLE does not like this structure! It cannot read it. It will only parse the DID if it is a "run on sentence" with no spaces and no line breaks. There must be a program that creates a 'legacy" DID structure. It appears that Adobe Acrobat DOES create the correct structure, but I would love to get by NOT having to buy that! Any other options? Thanks.

07-03-2011, 06:44 AM	#10
Jeff L Zealot Posts: 117 Karma: 584308 Join Date: Oct 2010 Location: San Francisco Device: Kindle	I use PDFinfo. http://www.bureausoft.com/download.html

07-03-2011, 10:50 AM	#11
R22 Member Posts: 12 Karma: 1536 Join Date: Jun 2011 Device: Kindle	Jeff L - Thanks! Well, that seems to work -- but, as always, I remain a bit confused! And now I think that MAYBE the BeCyPDFMetaEdit program will work too... When I opened a 'virgin' PDF file that did not have any Author MetaData, PDFInfo seems to ONLY create the Document Information Dictionary (DID). It did not create the XML MetaData Stream (MDS). Strangely, it again creates the same pattern that the above programs create -- that is to say, it does NOT create the linear, run-on version of the DID. It creates the version with the line breaks separating each pair from one another, and the spaces before the data entry (see above). Of note, Acrobat itself creates the linear run-on version! But for some reason, after using PDFInfo, the Kindle IS able to parse and display the Author correctly! I sent myself two files and both worked perfectly. So... I am at a loss to figure out completely WHY or HOW this works. This is how it seems to be functioning for me: ________________ 1) If you have NO MetaData Stream (MDS) field in your document, then a multiple lined DID (with line breaks and spaces) will be correctly parsed by Kindle and it will display the Author. 2) If you have a MDS field, then Kindle will only parse a linear, run-on DID. Otherwise it will not display the Author. _________________ I know this sounds weird and likely is more than anyone ever wants to know, but this has been driving me crazy! So I decided I had to figure it out as best I could! So.... I my next test is to use the BeCyPDFMetaEdit program to Edit the Author -- but this time I will check the box to DELETE the "XMP MetaData". That should create a document that should the criteria #1 listed above and should allow Kindle to display the Author. I'll get back to you. I know you can't wait and are riveted to your computer screens!