View Single Post
Old 04-13-2010, 02:45 PM   #9
zenocon
Junior Member
zenocon began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Apr 2010
Device: none
Quote:
Originally Posted by Starson17 View Post
I'm not sure your process is optimal, but any improvements I'd suggest would still be manual. I'm not a pdf expert, but if pdf's have an isbn field in the internal metadata, Calibre should read it on import. Why not just process your books before import to make sure they have the right isbn? (I don't really know if pdfs have isbn or if Calibre reads it, but that's what I'd check first.)
There isn't a dedicated internal meta-field for this in the PDF spec, AFAIK, but even if there were, it can't be counted on that it would be valid/populated. I could do something where I batch process all PDFs, scan the text for ISBN, and insert a meta-field, but...it seems like if I already have the ISBN right there, why not use it to fetch the rest of the meta-info from web services?

And other formats may not have meta-info...for example, CHM is just compressed HTML. While one can obviously put meta-info in HTML, you can't count on the files themselves to conform to this....this is why I scan the text myself. Despite how bad that sounds, the ISBN text is almost always within the first 10 or so pages of the document, which means you can quit early.


Quote:
Kovid is very good about adding code you provide when you want to enhance Calibre. I've seen other requests for duplicate location, so this is a useful feature that others want. Personally, I'd like to see it added as an SQL based search that I could store, rather than a dedicated duplicate location button. You should be aware that Kovid has made it very easy to set up a development environment for Calibre if you want to get your feet wet in Python.
Cool, I may look into it. I think I will build a quick prototype on the Air platform for a simple UI that can scan files, and there now exists a bridge between Air and Java http://www.merapiproject.net/ so I can use that to find the ISBN with my Java lib, look up its info with AWS and display a simple, editable grid view. Just curious to see how it would play out.
zenocon is offline   Reply With Quote