MobileRead Forums - View Single Post

zenocon · 04-13-2010, 02:07 PM

Hi, I've been dabbling with building my own OSS ebook manager. I have a fairly large collection of PDF/CHM files that I use for reference material.

Considering that building your own application like this is a fairly large undertaking, I've also scanned the interwebs for other solutions out there.

Calibre has some great features, and is a great project -- kudos to the author(s). However, if I were going to build it, I'd do a couple things slightly differently, or I'd add a few things. I'd be curious to get the contributor's feedback on this wishlist regarding:

a) something that may be planned for a future release
b) something that can be contributed as a plugin
c) not an option

#1 Automatically finding ISBN from the text itself. I have software that does this with fairly high accuracy. Right now it handles only PDF and CHM, but technically, there isn't anything to prevent it from adding other formats. If the ISBN is in the text, my code will find it. It deals with malformed ISBN and multiple ISBN and dupes. I use AWS to scan Amazon and have an algorithm that can do a fuzzy match based on the file name -- if it is close...in order to detect differences when a text contains more than one ISBN.

This feature is key for me. Using calibre now, I have to do this manually for every file, which is just going to take me waaaayy too long. I know the code is in Python/C++. My code is in Java. You can bridge via JNI. I would love to get this in there somehow. It is my number one showstopper.

#2 Separate the library db itself from scratch/import area. The reason for this is fairly straightforward. I don't really want to import into my main library until after I have tagged everything correctly, and am certain I want to import the books into my library. The view, as it is now...shows everything. What if I dragged some new books into the library, and then sorted...it is hard to find them again, and they are in an inconsistent state w/o proper meta-info. So, I'm looking for a scratch area, where I can drag files in...lookup meta-info, and then select them or select-all, and import into the main library, which essentially just inserts them into SQLLite, and moves the files for me.

#3 Different UI for Edit MetaInformation in Bulk. It would be easier/faster to have this form be on the side of the UI. If you select an individual file, the form is populated. If you select multiple files, the form is populated with any fields that are the same and for variance it can use something like <varies>. The user can simply type in a new string and tab or enter, and it commits it. This is much faster than having to right-click and navigate a menu to get to the form. For an example of how this works with MP3 files, take a look at

In the bottom-left pane, there is a tag form. If you have multiple items selected in the main pane, and you edit that form, it commits them to all items selected. This is what I'm looking for.

#4 Duplicate detection. There is none currently, as far as I can tell. Since all my books do have an ISBN, I'd like to be able to find/detect dupes and easily remove them.

#5 Minor UI gripe: the main pane that shows the list of books should be draggable for re-size, so you can view the meta-info pane below it.

Regards,
Davis

04-13-2010, 02:07 PM	#1
zenocon Junior Member Posts: 6 Karma: 10 Join Date: Apr 2010 Device: none	Features Discussion Hi, I've been dabbling with building my own OSS ebook manager. I have a fairly large collection of PDF/CHM files that I use for reference material. Considering that building your own application like this is a fairly large undertaking, I've also scanned the interwebs for other solutions out there. Calibre has some great features, and is a great project -- kudos to the author(s). However, if I were going to build it, I'd do a couple things slightly differently, or I'd add a few things. I'd be curious to get the contributor's feedback on this wishlist regarding: a) something that may be planned for a future release b) something that can be contributed as a plugin c) not an option #1 Automatically finding ISBN from the text itself. I have software that does this with fairly high accuracy. Right now it handles only PDF and CHM, but technically, there isn't anything to prevent it from adding other formats. If the ISBN is in the text, my code will find it. It deals with malformed ISBN and multiple ISBN and dupes. I use AWS to scan Amazon and have an algorithm that can do a fuzzy match based on the file name -- if it is close...in order to detect differences when a text contains more than one ISBN. This feature is key for me. Using calibre now, I have to do this manually for every file, which is just going to take me waaaayy too long. I know the code is in Python/C++. My code is in Java. You can bridge via JNI. I would love to get this in there somehow. It is my number one showstopper. #2 Separate the library db itself from scratch/import area. The reason for this is fairly straightforward. I don't really want to import into my main library until after I have tagged everything correctly, and am certain I want to import the books into my library. The view, as it is now...shows everything. What if I dragged some new books into the library, and then sorted...it is hard to find them again, and they are in an inconsistent state w/o proper meta-info. So, I'm looking for a scratch area, where I can drag files in...lookup meta-info, and then select them or select-all, and import into the main library, which essentially just inserts them into SQLLite, and moves the files for me. #3 Different UI for Edit MetaInformation in Bulk. It would be easier/faster to have this form be on the side of the UI. If you select an individual file, the form is populated. If you select multiple files, the form is populated with any fields that are the same and for variance it can use something like <varies>. The user can simply type in a new string and tab or enter, and it commits it. This is much faster than having to right-click and navigate a menu to get to the form. For an example of how this works with MP3 files, take a look at In the bottom-left pane, there is a tag form. If you have multiple items selected in the main pane, and you edit that form, it commits them to all items selected. This is what I'm looking for. #4 Duplicate detection. There is none currently, as far as I can tell. Since all my books do have an ISBN, I'd like to be able to find/detect dupes and easily remove them. #5 Minor UI gripe: the main pane that shows the list of books should be draggable for re-size, so you can view the meta-info pane below it. Regards, Davis