MobileRead Forums - View Single Post

mdroberts · 12-27-2009, 06:54 AM

I started using Calibre to organize my PDF ebooks, but quickly got bogged down with entering titles and author names and ISBNs by hand, and then fetching metadata from isbndb.com. It works, but it's a slow process with a lot of manual verification. Something automated would be much easier.

In a previous thread, this issue was explored, the upshot being that code was added to Calibre that would read ISBNs from the PDF content (and thank you, Calibre gurus, for that), but unfortunately the code doesn't work for any of my PDFs. I'm not sure it works at all, actually. I opened ticket #4113 to track this issue, but it seems to be low-priority.

Since it might be a long while before this gets fixed, I'm trying to figure out other solutions.

For example, I see that Calibre can fetch some of the metadata from the file name, and so I'm wondering if there's some other application that I could use to scan a bunch of PDFs, extract ISBNs from their content, and then rename the PDF files before I feed them into Calibre. Acrobat supports JavaScript and that might be a solution (i.e., write a script that scans the PDF, gets the ISBN, etc.), but it would mean figuring out the Acrobat JS API, and I'm pretty much of a noob at that (FWIW, I tried this one: <http://www.evermap.com/JavaScript/ExtractISBN.txt> but it doesn't work as advertised).

Anybody have any other ideas about automated ways to get metadata from PDFs?

12-27-2009, 06:54 AM	#1
mdroberts Junior Member Posts: 7 Karma: 10 Join Date: Nov 2009 Device: none	Metadata and PDFs I started using Calibre to organize my PDF ebooks, but quickly got bogged down with entering titles and author names and ISBNs by hand, and then fetching metadata from isbndb.com. It works, but it's a slow process with a lot of manual verification. Something automated would be much easier. In a previous thread, this issue was explored, the upshot being that code was added to Calibre that would read ISBNs from the PDF content (and thank you, Calibre gurus, for that), but unfortunately the code doesn't work for any of my PDFs. I'm not sure it works at all, actually. I opened ticket #4113 to track this issue, but it seems to be low-priority. Since it might be a long while before this gets fixed, I'm trying to figure out other solutions. For example, I see that Calibre can fetch some of the metadata from the file name, and so I'm wondering if there's some other application that I could use to scan a bunch of PDFs, extract ISBNs from their content, and then rename the PDF files before I feed them into Calibre. Acrobat supports JavaScript and that might be a solution (i.e., write a script that scans the PDF, gets the ISBN, etc.), but it would mean figuring out the Acrobat JS API, and I'm pretty much of a noob at that (FWIW, I tried this one: <http://www.evermap.com/JavaScript/ExtractISBN.txt> but it doesn't work as advertised). Anybody have any other ideas about automated ways to get metadata from PDFs?