Most pdf authors don't bother to set the metadata in the PDF correctly.
Guessing the metadata from the content might work for well formatted metadata like ISBN, but to get the rest, you'd have to "read" it semantically. Care to write an AI for that? I have trouble myself finding the real copyright date in gutenberg books.
|