MobileRead Forums - View Single Post - Performance hit due to Calibre control of file/folder names

Strange Quark · 09-01-2017, 10:05 AM

Hello,

I am a book lover who spent a lot of money on paper books but who now has a large e-book file collection. For many years (almost 15!) I was collecting my e-book collection and have been looking for a good e-book management SW. But before I even tried to find one I asked myself what are the essential features I would insist on.

After a lot of thinking I concluded that due to lack of dedicated ISBN number filed in metadata in official standards for all the popular e-book file formats (PDF, CHM, DJV(U), EPUB) I am unfortunately forced to use file name as a ISBN storage. If popular e-book file formats did have dedicated ISBN number in their official specifications then ISBN numbers for my e-books would be inserted into book files once I find these numbers either manually for each book or automatically/semi-automatically with help from a SW, and then then these numbers would be extracted easily with any e-book management software like Calibre. In that case I wouldn't care if Calibre or any other e-book management software insisted on its own naming scheme because I would know that ISBN numbers are safe and saved inside dedicated ISBN fields inside files themselves. But unfortunately this is not the case so I am forced to use file name as the most reliable ISBN storage. All my e-book files have the following name structure: (ISBN 978xxxxxxxx).ext and they are all in one folder named Book Library.

Although I am fully aware that any e-book management SW has a connection between ISBN numbers and file names for all files in its library due to the fact that I spent MONTHS on manual and semi-automatic check on reliable ISBN number detection for my e-book files I insist on having these file names "cemented". Although many e-book management applications (including Calibre) allow export of book collection in any file name format I cannot trust any SW (especially not a buggy one) to control the connection between file names and ISBN numbers for my large e-book collection since its database might be corrupted, and especially when I spent a lot of time to find and check these ISBN numbers for each file separately.

Many years ago I found a good e-book management SW (lets call it "Unknown") that allows me to keep my e-book file names and the folders they are in whatever I like. It also has a tremendously efficient and reliable semi-automatic ISBN number extraction from file content (from text). It also stores all its info in one large database file which is completely separated from my e-book files what makes archiving both management database file and e-book files very easy. But then I realized it has extremely poor search capabilities. What good is a e-book management SW if you cannot search your collection??? It is also not an open source but commercial application with extremely slow feature acceptance rate. You have to beg them to add a good search mechanism with no result.

Then, some years ago, I found Calibre. Its open source characteristics and extremely good search capabilities delighted me. But then I realized it insists on its own file/folder naming scheme which collided with my principle above.

I also found that there is a performance penalty in Calibre for using operating system's file system for controlling file/folder names. Unlike application "Unknown" which I currently use which relies on the fact that e-book files it imports in its database already exist somewhere and it does not care where and how they are named, Calibre has to create files and folders separately and since file system in OS has to check for collisions it is slow process for a large number of books being imported, especially when the existing collection is already large. When I import 100 books in Unknown application it takes a few seconds. But when I import them in Calibre it takes 20 minutes even when there are no actual files to import (copy) but only ISBN numbers! Unlike Calibre, application Unknown does not have to check with file subsystem in OS are there any collisions and if the files/folders could be created. Download of metadata from Internet is also a lot quicker in Unknown than in Calibre.

For some time I even used both Unknown and Calibre simultaneously! Calibre didn't have any files but only ISBN numbers imported, metadata downloaded, and then I used it for its good search capabilities.

Now, I am wondering is there a possibility to get rid of Calibre's file/folder naming reliance with a new plug-in? That plug-in would basically allow Calibre to import e-book files with their existing names and keep them in their existing folders. Is existing Calibre's architecture flexible enough to allow such a plug-in? Does Calibre's plug-in API allows such a plug-in? If Calibre's existing architecture does not allow such a plug-in today is there a chance to accommodate it in the future? I am afraid the answer is no but I want to check with Calibre experts anyway.

Thank you for your consideration and sorry for a long post.

Reader from Croatia

09-01-2017, 10:05 AM	#1
Strange Quark Junior Member Posts: 1 Karma: 10 Join Date: Sep 2017 Device: none	Performance hit due to Calibre control of file/folder names Hello, I am a book lover who spent a lot of money on paper books but who now has a large e-book file collection. For many years (almost 15!) I was collecting my e-book collection and have been looking for a good e-book management SW. But before I even tried to find one I asked myself what are the essential features I would insist on. After a lot of thinking I concluded that due to lack of dedicated ISBN number filed in metadata in official standards for all the popular e-book file formats (PDF, CHM, DJV(U), EPUB) I am unfortunately forced to use file name as a ISBN storage. If popular e-book file formats did have dedicated ISBN number in their official specifications then ISBN numbers for my e-books would be inserted into book files once I find these numbers either manually for each book or automatically/semi-automatically with help from a SW, and then then these numbers would be extracted easily with any e-book management software like Calibre. In that case I wouldn't care if Calibre or any other e-book management software insisted on its own naming scheme because I would know that ISBN numbers are safe and saved inside dedicated ISBN fields inside files themselves. But unfortunately this is not the case so I am forced to use file name as the most reliable ISBN storage. All my e-book files have the following name structure: (ISBN 978xxxxxxxx).ext and they are all in one folder named Book Library. Although I am fully aware that any e-book management SW has a connection between ISBN numbers and file names for all files in its library due to the fact that I spent MONTHS on manual and semi-automatic check on reliable ISBN number detection for my e-book files I insist on having these file names "cemented". Although many e-book management applications (including Calibre) allow *export* of book collection in any file name format I cannot trust any SW (especially not a buggy one) to control the connection between file names and ISBN numbers for my large e-book collection since its database might be corrupted, and especially when I spent a lot of time to find and check these ISBN numbers for each file separately. Many years ago I found a good e-book management SW (lets call it "Unknown") that allows me to keep my e-book file names and the folders they are in whatever I like. It also has a tremendously efficient and reliable semi-automatic ISBN number extraction from file content (from text). It also stores all its info in one large database file which is completely separated from my e-book files what makes archiving both management database file and e-book files very easy. But then I realized it has extremely poor search capabilities. What good is a e-book management SW if you cannot search your collection??? It is also not an open source but commercial application with extremely slow feature acceptance rate. You have to beg them to add a good search mechanism with no result. Then, some years ago, I found Calibre. Its open source characteristics and extremely good search capabilities delighted me. But then I realized it insists on its own file/folder naming scheme which collided with my principle above. I also found that there is a performance penalty in Calibre for using operating system's file system for controlling file/folder names. Unlike application "Unknown" which I currently use which relies on the fact that e-book files it imports in its database already exist somewhere and it does not care where and how they are named, Calibre has to create files and folders separately and since file system in OS has to check for collisions it is slow process for a large number of books being imported, especially when the existing collection is already large. When I import 100 books in Unknown application it takes a few seconds. But when I import them in Calibre it takes 20 minutes even when there are no actual files to import (copy) but only ISBN numbers! Unlike Calibre, application Unknown does not have to check with file subsystem in OS are there any collisions and if the files/folders could be created. Download of metadata from Internet is also a lot quicker in Unknown than in Calibre. For some time I even used both Unknown and Calibre simultaneously! Calibre didn't have any files but only ISBN numbers imported, metadata downloaded, and then I used it for its good search capabilities. Now, I am wondering is there a possibility to get rid of Calibre's file/folder naming reliance with a new plug-in? That plug-in would basically allow Calibre to import e-book files with their existing names and keep them in their existing folders. Is existing Calibre's architecture flexible enough to allow such a plug-in? Does Calibre's plug-in API allows such a plug-in? If Calibre's existing architecture does not allow such a plug-in today is there a chance to accommodate it in the future? I am afraid the answer is no but I want to check with Calibre experts anyway. Thank you for your consideration and sorry for a long post. Reader from Croatia