MobileRead Forums - View Single Post - [Old Thread] Auto Extract ISBN-Feature request

myle00 · 07-16-2009, 12:28 AM

I finally finished redoing it. The attached zip has the java doc as well as 3 classes. The main class is BookNames and in its main method you run everything. The other two classes are helper classes.

Firstly, I tried to make it OS independent so before you use it you'll have to make sure the class variables in all three classes are set for your specific situation. Nonetheless I only tried it on Windows so you should test it before you run it on a whole batch of files.

A I said you'd need to install pdftk and than run the genPDFTKcat() method over the folder with the pdfs. The method will save a text file with the ready commands to run in your OS since pdftk is OS independent. you'll have to make the text file into a batch file or paste it into you command terminal. Once pdftk is finished running you'll have to run the OCR software on the extracted pdf files.

When that is done, you'll have to run isbnDriver() which will do the rest. However, in order to run this you'll need to get a amazon associate number as well as the isbndb key since it'll have to download info from amazon and isbndb. Set the correct variables in copyURL to these two keys and it should work.

I just tested it on 850 files and it got ~680 of them. On 640 of them the program automatically selected the correct ISBN, I had to select the correct one only on 20. The rest I suspect either don't have ISBNs or OCR wasn't good enough. Also, sometimes it cannot move and rename the file because there is already there a file with the ISBN because it's a duplicate, so it'll save a text file listing all the files that failed and what they should have been renamed to. It also saves a backup list with all the old and new file names. But you'll see all that in the javadoc and comments.

There are a bunch of other methods there, most of them I use to manipulate book files and folders so I left them there since I think they'll be useful.

As always I don't take any responsibility blah blah blah. But it should work because it works for me. If there are any questions just pm or post.

Good luck,
Matt

b.t.w. I would recommend using the print book ISBN since amazon doesn't have much ebooks so it'll be harder to get metadata on the ebook. Even isbndb may not have all the ebooks, so you're better off with the print ISBN.