View Full Version : Djvu: Extracting ISBN numbers from a large number of books?


MelBr
12-04-2013, 07:50 PM
Hi all,

I have a large number of old math books and they're all in djvu format and filenames are somewhat messy. I'd like to import them all into Calibre so I can categorize them and fix names etc.

Is there an easy(ier) way to extract ISBN numbers other than converting them all to PDF and then running OCR on all of them? I'd really like to avoid that since that would take very long time, would produce large files, and there's still no guarantee that it would OCR all of the ISBNs properly. Option of last resort is of course to manually type all the ISBN numbers. I'd like to avoid that one of course :)

Thanks for any tips!

Mel

DaleDe
12-05-2013, 01:33 PM
I guess it would depend on where the ISBN is stored in the file. This would typically be in some of the metadata. What viewer are you using to look at the file. Can it view the metadata? See DJVU in our wiki for some viewers you might try.

Dale

MelBr
12-06-2013, 05:33 PM
Thank you for a comment, Dale!

Unfortunately, these files don't seem to have any metadata. I've checked it with few viewers. For example, DjView shows this:

http://i.imgur.com/KIT0bi0.jpg


I'm starting to believe that only some type of manual process is the only thing that will work. I'm looking at various OCR software packages to see if any of them can scan only first N pages but then I'll have to extract ISBN manually anyway… or maybe let Calibre plugin extract it somehow. I'm still thinking about the best way to go about this.

DaleDe
12-06-2013, 09:34 PM
Where do you see the ISBN data? On a title page, near the beginning of the book? I suspect you will need to open the book to see it.

Dale

DaleDe
12-06-2013, 09:48 PM
You might try the tools at http://djvu.sourceforge.net/index.html. They have a djvused tool that is somewhat like Unix sed to extract data from a file.

Dale

BobC
12-09-2013, 02:27 PM
I've also found DJVUTOY a useful piece of software for manipulating DJVUs. :

http://www.comicer.com/stronghorse/software/exe/DjVuToy_eng.zip

is an English version, unfortunately the main site is in Chinese.

With it you can split and merge DJVUs, insert bookmarks and manipulate hidden text by exporting it, editing it then re-importing it. I used an early version for creating effectively a clickable TOC for a number of DJVUs. Documentation is sparse so a bit of experimentation is needed to get the best out of it.

BobC

willus
12-21-2013, 08:11 PM
@MelBr -- Any chance a sample .djvu file could be posted?

Noobish
04-13-2014, 04:35 AM
Hi all,

I have a large number of old math books and they're all in djvu format and filenames are somewhat messy. I'd like to import them all into Calibre so I can categorize them and fix names etc.

Is there an easy(ier) way to extract ISBN numbers other than converting them all to PDF and then running OCR on all of them? I'd really like to avoid that since that would take very long time, would produce large files, and there's still no guarantee that it would OCR all of the ISBNs properly. Option of last resort is of course to manually type all the ISBN numbers. I'd like to avoid that one of course :)

Thanks for any tips!

Mel

I had the same problem, i like my books in pdf since it is compatible with many OS/Devices and can be compressed , for some reason the metadata is lost after conversion to pdf . I use free software btw for that.

check http://www.mobileread.com/forums/showthread.php?t=237519&highlight=[GUI+Plugin]+Extract+ISBN