View Full Version : Indexeses


NetSlut
09-04-2008, 07:05 PM
An index of eBooks! An index of eBooks! My kingdom for an index of eBooks!

Unless there already is one or someone makes one, in which case Half a pack of crisps and a chocolate hob nob for an index of eBooks!

As someone new to the site, trying to see which books are here is proving to be a bit laborious. What I could really do with is a simple page or text file that lists *once* each title available, and a simple, single letter code denoting which formats that title is available in.

Has anyone done this?

Patricia
09-04-2008, 07:09 PM
online books page
http://onlinebooks.library.upenn.edu/

digital book index
http://www.digitalbookindex.com/_search/search011t-rev.asp

EBDB: electronic book database
http://www.ebdb.net/

NetSlut
09-04-2008, 07:19 PM
Thanks Patricia, helpful as always!

Is there an index specifically for this site?

Patricia
09-04-2008, 07:36 PM
Click on Ebooks then on 'Browse latest uploads'. If you click on say, 'M' on the alphabetical list, then on 'author/ebook' on the pale bluish-grey bar, then all the authors whose surnames begin with M will appear in alphabetical order.

It is a bit cumbersome and a better search facility may appear when Alex (the site-owner) has some free time. He is very busy at the moment.

Sparrow
09-05-2008, 03:57 AM
Another method is to click the 'Full List' link for TXT or HTML (http://www.mobileread.com/forums/ebooks.php?do=getlist&type=html) in the right-hand corner.

Slite
09-05-2008, 04:06 AM
I would give my left nu... well, maybe not, but I would bear the baby (even tho I am male) of the person who could make a REALLY good e-book indexer that can search my harddrive for e-books in different formats and index them...

I tried one that was called My Ebook Library which worked ok but could have used som more features. But that seems to have gone to /dev/null now. As the homepage is no longer available.

So, anyone know of such a beastie? Heck, I'm even willing to pay for it :)

NetSlut
09-05-2008, 04:55 AM
I might have a go at that, Slite; would you be willing to PM me with a list of features you'd like the thing to have?

Slite
09-05-2008, 04:58 AM
I might have a go at that, Slite; would you be willing to PM me with a list of features you'd like the thing to have?

Will do :)

zelda_pinwheel
09-05-2008, 12:39 PM
i would also be really really interested in such a program. and i am sure many other people around here would...

Slite
09-05-2008, 12:51 PM
i would also be really really interested in such a program. and i am sure many other people around here would...

Maybe a joint effort as far as wishlist is concerned? :)

I'll start:

Search indicated directories for e-books, index by format, option to download info/isbn from isbndb.com. Search database by title, isbn, genre and format.

For starters :)

zelda_pinwheel
09-05-2008, 01:04 PM
Maybe a joint effort as far as wishlist is concerned? :)
good idea !

i'll add :
- ability to sort by author or title (or other ?)
- complete meta-data support, including description / summary !! even cover image would be nice.
- possibility to edit meta-data, at least in the database, ideally in the file itself (but that might be harder).
- option to create lists / groupes of books, and possibly even re-organise the files themselves accordingly (like : all of Dorothy L. Sayers together, regardless of format. or, all mobi format together in one directory).
- possibility to add tags / comments of your own (like : to read, read, first of series, etc.)
- possibility to add books manually to create a "wish-list" of books you don't have, with meta-date (isbn) completed automatically perhaps, and tags and comments ("recommended by...")

...for starters. :) if that's possible, of course !!!

(methinks NetSlut may end up regretting his proposal to make this program...)

NetSlut
09-05-2008, 04:49 PM
good idea !

...

(methinks NetSlut may end up regretting his proposal to make this program...)
That's an excellent idea.

Please, everyone, do pitch in with any features you're interested in. I'll let this part run for a couple of days, and then I'll take stock and actually design the thing. Then I'll write it whenever I've got time.

I've been looking for a personal project for some time, so this is ideal.

pilotbob
09-05-2008, 04:57 PM
As someone new to the site, trying to see which books are here is proving to be a bit laborious. What I could really do with is a simple page or text file that lists *once* each title available, and a simple, single letter code denoting which formats that title is available in.


There is a text file / download guide that is available. http://www.mobileread.com/mobiguide (http://www.mobileread.com/forums/../mobiguide) however it isn't "all" books just the mobi format ones. It is also a mobipocket books itself. I wouldn't be surprised if alex could use the same code that creates this and have it output to a CSV text file or a spread sheet. Ah, it looks like you can get a text file listing from here http://www.mobileread.com/forums/ebooks.php?do=getlist&type=txt . Once again it is ALL the books in all formats. However, I don't see anything like CSV, XML or other well deliniated method.

BOb


BOb

pilotbob
09-05-2008, 05:01 PM
good idea !

i'll add :
- ability to sort by author or title (or other ?)
- complete meta-data support, including description / summary !! even cover image would be nice.
- possibility to edit meta-data, at least in the database, ideally in the file itself (but that might be harder).
- option to create lists / groupes of books, and possibly even re-organise the files themselves accordingly (like : all of Dorothy L. Sayers together, regardless of format. or, all mobi format together in one directory).
- possibility to add tags / comments of your own (like : to read, read, first of series, etc.)
- possibility to add books manually to create a "wish-list" of books you don't have, with meta-date (isbn) completed automatically perhaps, and tags and comments ("recommended by...")

...for starters. :) if that's possible, of course !!!

(methinks NetSlut may end up regretting his proposal to make this program...)

Calibre does most of this. However it doesn't support/read ALL formats meta data. Perhaps rather than writing a whole new app you could work with Kovid. He does accept patches. I think someone right now is trying to add IMP support to it. Also, Kovid has mentioned setting up the ablity to add format/conversion support by pointing to external programs... so for example all the mobiperl scripts could be supported.

BOb

wallcraft
09-05-2008, 05:23 PM
Ah, it looks like you can get a text file listing from here http://www.mobileread.com/forums/ebooks.php?do=getlist&type=txt . Once again it is ALL the books in all formats. However, I don't see anything like CSV, XML or other well deliniated method. Most of the lines have the form AUTHOR: TITLE (FORMAT). Only 259, out of 6003, entries have no ":" (although not all the rest necessarily have a ":" after the author). The posting guidelines actually say that the message subject should be of the form: Author_surname, Author_firstname (or initials): Book_title. Version. Posting Date So this is where the text entry conformity comes from.

NetSlut
09-06-2008, 02:29 PM
Mmm, I think Calibre (a wonderful program that I've been using) does something different: it seems to me to be more focussed on device management and conversion of ebooks, rather than a pure indexing tool.

I think what I want is something more like the Apple iPod, doing one thing exceptionally well.

All of the features wished for so far shouldn't prove to be much of a problem, although I'm hesitant over the .txt file support since I think so many text files aren't ebooks that they will need special handling.
But everything sounds feasible, so as I say, let's give it a few days for ideas to percolate and then we'll take stock and see what can be done.

kovidgoyal
09-06-2008, 02:52 PM
Mmm, I think Calibre (a wonderful program that I've been using) does something different: it seems to me to be more focussed on device management and conversion of ebooks, rather than a pure indexing tool.

I think what I want is something more like the Apple iPod, doing one thing exceptionally well.

All of the features wished for so far shouldn't prove to be much of a problem, although I'm hesitant over the .txt file support since I think so many text files aren't ebooks that they will need special handling.
But everything sounds feasible, so as I say, let's give it a few days for ideas to percolate and then we'll take stock and see what can be done.

The calibre GUI was written originally to index books. What about indexing does it not do well?

NetSlut
09-07-2008, 07:31 AM
The calibre GUI was written originally to index books. What about indexing does it not do well?
Well, there are a number of things that I think would be done better if the program didn't have to pay attention to the needs of the conversion and device management; not enough attention is paid to indexing by format, for example, and personally I want to have an index that will actually tell me the full path and file name to each of my ebooks, or to gather them into a single place, much like iTunes does.

As I said before, Calibre is a very useful program that I will continue to use (even though it doesn't work properly on my mac, and tends to corrupt the prs-505 sometimes), but this is a different program with a different focus.

kovidgoyal
09-07-2008, 12:26 PM
Well, there are a number of things that I think would be done better if the program didn't have to pay attention to the needs of the conversion and device management; not enough attention is paid to indexing by format, for example, and personally I want to have an index that will actually tell me the full path and file name to each of my ebooks, or to gather them into a single place, much like iTunes does.

As I said before, Calibre is a very useful program that I will continue to use (even though it doesn't work properly on my mac, and tends to corrupt the prs-505 sometimes), but this is a different program with a different focus.

Feel free to write you own program, I'm just curious as to what features you think are missing.

calibre does gather all your books into a single place (a database in versions <= 0.4.83 and a user specified folder in higher versions). Not sure what you mean by indexing by format. You can filter the list of books by format, what else do you want to do?

pilotbob
09-07-2008, 01:34 PM
Mmm, I think Calibre (a wonderful program that I've been using) does something different: it seems to me to be more focussed on device management and conversion of ebooks, rather than a pure indexing tool.

You can add tags to books and search them by tag. You can sort the books by Title, Author, Publisher, Series. Book info does not show you the file name like iTunes does, but the newest version (still in beta) organizes the files on your hard drive by Author, Title so they are very easy to find if you want to transfer ebooks to a device using the OS Explorer/Finder.

Yes, I think it does need a bit more work. I would like to see a tag list / cloud so you can navigate through them without having to sort on them. Also, I think the list has to be paged rather than just listing all the books in the grid since I have seen reports of large (10k) collections really slowing down the UI.

You should download the beta to your Mac... it has come along way. Not sure what version you are using. Then, you can add features you like to it rather than starting from scratch...

In addition to the tag list / cloud navigation it would be nice to Like, it would be nice to display the file path somewhere. This is actually a new feature in the beta where the files are stored in the file system rather than the database.

BOb

kovidgoyal
09-07-2008, 01:43 PM
Yes, I think it does need a bit more work. I would like to see a tag list / cloud so you can navigate through them without having to sort on them. Also, I think the list has to be paged rather than just listing all the books in the grid since I have seen reports of large (10k) collections really slowing down the UI.
BOb

Actually it's the sort/filter operations that become slow, no the UI. Paging wont help that, since for the results to be correct, a sort/filter has to run through the full database.

pilotbob
09-07-2008, 01:47 PM
Actually it's the sort/filter operations that become slow, no the UI. Paging wont help that, since for the results to be correct, a sort/filter has to run through the full database.

Ah... what db are you using? It seems slow in that case, cause 10k records isn't alot. Do you have the data indexed by title/author/tags so that the db doesn't have to do a table scan and it can do an index scan?

Do you create tag index tables or just sort through a tags field in every record to find stuff?

BOb

nekokami
09-07-2008, 01:53 PM
In a lot of my books currently, the metadata (title, author) is included in the file path. I'd like a utility to be able to be configured to parse this. How people have organized books tends to be individual, so it would need to be configurable. In my case, I have all my books in a "books" directory, and within that, I have directories named after each author, in Last, First format. Within those directories, things vary a bit more, but usually there is a level of format directories, and then within that, filenames may include the author's name as well as the name of the book.

So in my case, I'd like to be able to configure the indexing utility to check within a given directory and assume that the next directory level down will provide the author's name, then search recursively within those directories for all files and remove the author's name from the filename to get the book title, if appropriate.

Once the author's name and book title are available, of course, it would be great to be able to get other info from an online database. :)

One other feature I'd like is the ability to enter or import my paper book collection, and tie my paper inventory to my electronic inventory. I'm gradually trying to replace all my paper books, especially fiction, with ebook versions. I have Readerware and a barcode scanner, so I plan to have a full online inventory of my paper books "soon." Readerware has some nice lookup features that might be able to provide me with an estimate of the resale value of my paper books, so it might also be useful to have a system be able to look up and store the lowest cost available (of a specified format or formats) to purchase an ebook replacement. I could then calculate how much it would cost, net, to sell my paper books and buy commercial versions of ebook replacements (when available).

kovidgoyal
09-07-2008, 01:57 PM
Ah... what db are you using? It seems slow in that case, cause 10k records isn't alot. Do you have the data indexed by title/author/tags so that the db doesn't have to do a table scan and it can do an index scan?

Do you create tag index tables or just sort through a tags field in every record to find stuff?

BOb

Sqlite, and it is fully indexed on all sort fields. I haven't really looked into what is causing the slowdown, primarily because this isn't a problem for most users, but I suspect some flaw in the database design. I'm just not enough of a sql guru to be able to figure it out easily.

pilotbob
09-07-2008, 02:03 PM
In my case, I have all my books in a "books" directory, and within that, I have directories named after each author, in Last, First format. Within those directories, things vary a bit more, but usually there is a level of format directories, and then within that, filenames may include the author's name as well as the name of the book.


That is very similar to how the calibre beta organizes files. You specify the top level folder. In that folder is a folder for each author. The folder is named based on the authors name in the calibre database. So, you can put the author as lastname firstname if you want. Within each author folder is a folder for the book title. With that are the book files and a .obd file an sometimes a cover picture graphic file. While it would be nice to be able to specify this the current layout works perfectly well for me and it is easy to find an ebook file that I want to transfer to my device.


Once the author's name and book title are available, of course, it would be great to be able to get other info from an online database. :)


It does do that. Generaly getting the ISBN and Summary info from isbndb.com.


One other feature I'd like is the ability to enter or import my paper book collection, and tie my paper inventory to my electronic inventory. I'm gradually trying to replace all my paper books, especially fiction, with ebook versions. I have Readerware and a barcode scanner, so I plan to have a full online inventory of my paper books "soon."

Hmm... I don't use readerware... I use librarything for that. Although I have choosen to enter "books I've read" in library thing rather than use it to index files/books that I have/own. Of course you could use it that way via tags. Then I use calibre to "index" my ebook files and also do conversions to LRF if needed. I don conversions to Mobi using other command line tools then add those files to calibre. But, I think there are future plans to be able to plug in other command line tools.

Hey, there is always room for more software in this world. But, since calibre is open and Kovid accepts patches it seems to make sense to add some of these features to it rather than starting with a clean slate.

BOb

Elsi
09-07-2008, 02:35 PM
I have yet to download and use the beta version of Calibre, but it's on my list of things that I really want to do. So, I'm not sure how Calibre manages the inventory of books in this release.

What *I* want is a program that will create an index of the books -- exactly where they are in the file system. I do not want a program that moves them into a new location on the hard drive(s). I do not want them filed in folders by author since I have chosen to organize them by source. But an index would allow me to view by author *across the existing directories*. And that's a very good thing.

NetSlut
09-07-2008, 03:39 PM
I think the flexibility of understanding and optionally organising directory paths is a must for what I wanted to see. I have a nasty habit of forgetting certain author's names and just remembering the book titles, especially if they are one-off's, so I'd want to be able to have different paths set up but remain in one index.

I want something that would show my collection in ways that are meaningful to me at the time of searching, so I can see which books are missing a ebk format, or which Stainless Steel Rat books have yet to come out.

Maybe part of my problem in missing these things is that I haven't tried the beta version of Calibre :-) and I haven't had much time with the release version either (it's the latest release, by the way); but I think shoehorning what I want to do into Calibre, would be more difficult.

kovidgoyal
09-07-2008, 05:32 PM
I want something that would show my collection in ways that are meaningful to me at the time of searching, so I can see which books are missing a ebk format, or which Stainless Steel Rat books have yet to come out.


In the search bar enter:


format!ebk


for all books that dont have ebk format. Assuming you've tagged the books that have yet to be released:

series:"Stainless Steel Rat" tag:unreleased


will do the trick. I doubt there's any collection querying feature that you can think of that calibre can't do.

And to address the concerns about folder organization. My perspective is: does it really matter? Why do you organize your books into folders in the first place? I'm guessing, so that you can find a particular book faster and keep the various files belonging to a "book" in one place. I claim that having the books managed by calibre allows you to find them much faster than in any conceivable folder layout. Not only that, you get automatic support for rich metadata (covers, summaries, series, ratings, etc) and the ability to seamlessly convert either individually or in batch any given subset of your books at the click of a button.

nekokami
09-07-2008, 05:40 PM
Hmm... I don't use readerware... I use librarything for that. Although I have choosen to enter "books I've read" in library thing rather than use it to index files/books that I have/own. Of course you could use it that way via tags. Then I use calibre to "index" my ebook files and also do conversions to LRF if needed. I don conversions to Mobi using other command line tools then add those files to calibre. But, I think there are future plans to be able to plug in other command line tools.

Hey, there is always room for more software in this world. But, since calibre is open and Kovid accepts patches it seems to make sense to add some of these features to it rather than starting with a clean slate.

BOb
Integration with Librarything would be fine, too (or instead). I have a lifetime account there (I found Librarything after I already had gotten Readerware). The point is to be able to manage both my ebook and pbook collections within one system and see how far out of sync they are. I have quite a few "format shifted" ebooks, and I want to make sure my ebook collection is accounted for in either paid-for commercial ebooks or matching pbooks. Also, as previously mentioned, I'm trying to duplicate/replace my paper library entirely with ebooks, as nearly all my pbooks are currently in storage due to lack of shelf space, and I'm going through withdrawal symptoms on some of them. ;)

BTW, does Calibre consider CBZ/CBR as supported filetypes? Due to space considerations, I've been format-shifting my manga collection, as well. Sometimes I convert the CBZ files to PDF using various utilities, but not always.

kovidgoyal
09-07-2008, 06:21 PM
BTW, does Calibre consider CBZ/CBR as supported filetypes? Due to space considerations, I've been format-shifting my manga collection, as well. Sometimes I convert the CBZ files to PDF using various utilities, but not always.

Yes it does.

zelda_pinwheel
09-07-2008, 06:37 PM
i will have to take a closer look at calibre. so far i have not because my original impression was that it was specifically for making sony books, and i don't have a sony. i realised quickly that it had other features as well but i'm now discovering it has far wider applications than i ever suspected. also i've heard talk of integrating .imp support along with the other filetypes it already supports ; it could end up being just what i need.

NetSlut
09-07-2008, 06:41 PM
In the search bar enter:


format!ebk


That's a pretty good reason not to use Calibre right there ;)

I doubt there's any collection querying feature that you can think of that calibre can't do.
Oh, I'm pretty sure I can.

And to address the concerns about folder organization.
Uhh, what concerns would those be? You seem to be taking this as a "look what calibre doesn't do" thread, although I think we've tried hard to make it not like that. This thread is not about concerns with Calibre, indeed, any further discussion of Calibre should be moved to an appropriate thread.

Again, this thread is about an indexing tool that will allow me to look at and examine my collection of books, virtual or otherwise, and to organise and interrogate that collection in a multitude of ways. Let's please try to stay on-track with that concept.

kovidgoyal
09-07-2008, 08:59 PM
This was an attempt to get you to do the sensible thing and not re-invent the wheel. Since you seem to want to re-invent the wheel, best of luck.

NetSlut
09-07-2008, 09:58 PM
This was an attempt to get you to do the sensible thing and not re-invent the wheel. Since you seem to want to re-invent the wheel, best of luck.
:rofl:

Thank you.

nekokami
09-08-2008, 11:53 AM
But if the database backend of Calibre does most of what you want, writing a new front end might still be a viable option to get the tool you are looking for, don't you think?

NetSlut
09-08-2008, 12:01 PM
Nekokami, I would have thought that was exactly the case, until Kovid said that he wasn't an SQL guru (therefore the database backend isn't going to be very well optimised), and various issues with Calibre have been mentioned; it seems to me that the database backend isn't going to do what I want.

Besides which, I *am* an SQL guru, as well as a guru of many other disciplines, so I'm pretty certain there won't be much standing in my way.

zelda_pinwheel
09-08-2008, 12:15 PM
not to mention the fact that our new SQL guru needs a new project to keep him busy.

god knows we wouldn't want him to have too much free time on his hands ; who knows what kind of trouble he could get into. ;)

NetSlut
09-08-2008, 12:17 PM
Well, exactly. If it wasn't for this, I'd have to resume my hobby of deciding the basis upon which I should choose victims...

daffy4u
09-08-2008, 12:29 PM
Well, exactly. If it wasn't for this, I'd have to resume my hobby of deciding the basis upon which I should choose victims...

:eek::eek::eek: Can I be your friend (because I fear you)? :p

igorsk
09-08-2008, 12:35 PM
Some competition can be nice.

kovidgoyal
09-08-2008, 12:45 PM
Nekokami, I would have thought that was exactly the case, until Kovid said that he wasn't an SQL guru (therefore the database backend isn't going to be very well optimised), and various issues with Calibre have been mentioned; it seems to me that the database backend isn't going to do what I want.

Besides which, I *am* an SQL guru, as well as a guru of many other disciplines, so I'm pretty certain there won't be much standing in my way.

At this rate you're heading straight into the category of people that make big announcements of what they're going to do and are never heard from again. Produce some working software, and then talk.

pilotbob
09-08-2008, 12:47 PM
Nekokami, I would have thought that was exactly the case, until Kovid said that he wasn't an SQL guru (therefore the database backend isn't going to be very well optimised), and various issues with Calibre have been mentioned; it seems to me that the database backend isn't going to do what I want.

Besides which, I *am* an SQL guru, as well as a guru of many other disciplines, so I'm pretty certain there won't be much standing in my way.

So, what dev tools, language and sql backend do you plan to use? Will this be a multi-platform tool that runs on Windows/Mac/Linux as calibre does?

BOb

NetSlut
09-08-2008, 12:57 PM
:eek::eek::eek: Can I be your friend (because I fear you)? :p
Sure! That's one of the criteria I have for my most frequent victims! :-D

NetSlut
09-08-2008, 01:02 PM
At this rate you're heading straight into the category of people that make big announcements of what they're going to do and are never heard from again. Produce some working software, and then talk.

<sigh> ...and the only reason things are proceeding at this rate is because you seem to be insisting that yours is the only worthwhile program out there, and can do no wrong.

I have a really simple premise here: I just want a tool to index things for me. I don't want it to convert stuff. I don't want it to corrupt my device. I don't want it to look like a blind three-year-old wrote the GUI, and I don't want it to perform like a mangy dog when it gets just 10k entries.

Honestly, if you can't do better than that, you should quit and go and become a bricklayer.

NetSlut
09-08-2008, 01:08 PM
So, what dev tools, language and sql backend do you plan to use? Will this be a multi-platform tool that runs on Windows/Mac/Linux as calibre does?

BOb

At the moment, I have no plans. As I said before, I'll take stock of what people want to see, and then I'll make my plans.

I thoroughly disapprove of people choosing to work with tools they are familiar with just for that reason alone; tools should be chosen because they are appropriate to the task at hand.

Obviously, if multi-platform is an issue, then Java is a big candidate, but I really dislike Java's UI abilities.

Taylor514ce
09-08-2008, 01:10 PM
Hmmm... after reading this entire thread, I'm with Kovid here. The thread began with a question about indexing the books on this site, then moved to "what about a tool to organize/index MY books", and one suggestion was, "use Calibre, that's what it does".

To claim that Calibre discussion is off-topic, then, is silly. I very much like Calibre and think it might well be my book indexing/organization tool of choice. I understand the urge to write software, being a coder myself, but have you considered offering your SQL skills to Kovid?

Frankly, I don't care where the books are stored, as long as I can back them up. I want to find particular titles, sort my collection various ways, have a nice search mechanism, a color scheme or tagging scheme I can customize (original format, original source, on the Reader or not, already read, paper version only, etc.). If the best way to do that is to copy all your books to a central location, that's fine with me. I think Kovid's question, to paraphrase, "why do you care about folders?" is cogent. If the software lets you interact with the books, who cares what folder organization is used?

I'll be very interested in what you come up with, but I'm equally interested in existing tools.

zelda_pinwheel
09-08-2008, 01:11 PM
hey now, let's both of you calm down. kovid, i think netslut has made it clear he appreciates the usefulness of calibre and in fact he uses it himself. no-one is questioning that. if he wants to write his own indexing tool because he has different priority (and lots of time on his hands), i would think that would be a good thing. and as igorsk has said, a little competition can be a good thing. presumably he has specific needs which are not met by calibre. netslut's app might turn out to be complementary to calibre, or completely different. either way there's no reason he shouldn't try it out. let's try not to let this get personal, okay ? both of you. no-one is in the wrong here and there's no reason for tempers to flare up.

now shake hands, go back to your corners, both of you, and come out coding ! and preferably with a smile on your face ! :)

edit : and what taylor said is quite valid as well. thanks taylor.

daffy4u
09-08-2008, 01:12 PM
Hey kids, no need to fight. You're both coming from good places and trying to contribute to the community. There is room for both of you (even the scary one). :)

zelda_pinwheel
09-08-2008, 01:14 PM
Hey kids, no need to fight. You're both coming from good places and trying to contribute to the community. There is room for both of you (even the scary one). :)

exactly. that's what i was trying to say.

kovidgoyal
09-08-2008, 01:25 PM
<sigh> ...and the only reason things are proceeding at this rate is because you seem to be insisting that yours is the only worthwhile program out there, and can do no wrong.

I have a really simple premise here: I just want a tool to index things for me. I don't want it to convert stuff. I don't want it to corrupt my device. I don't want it to look like a blind three-year-old wrote the GUI, and I don't want it to perform like a mangy dog when it gets just 10k entries.

Honestly, if you can't do better than that, you should quit and go and become a bricklayer.

Try not to be rude, it gives people a higher opinion of your intelligence.

daffy4u
09-08-2008, 01:30 PM
exactly. that's what i was trying to say.

Maybe I should be a parrot instead of a duck. It's only because I want to be just like you if and when I grow up. :)

nekokami
09-08-2008, 01:38 PM
Try not to be rude, it gives people a higher opinion of your intelligence.
Kovid, with all due respect, your comments are coming across as somewhat rude as well. I'm with Zelda. I think this exchange could be more constructive if it calms down a bit.

kovidgoyal
09-08-2008, 01:47 PM
If I have been rude, it was unintentional, and I apologize. I wish NetSlut all the best in his endeavors to create an indexing program.

NetSlut
09-08-2008, 01:55 PM
Hmmm... after reading this entire thread, I'm with Kovid here. The thread began with a question about indexing the books on this site, then moved to "what about a tool to organize/index MY books", and one suggestion was, "use Calibre, that's what it does".

To claim that Calibre discussion is off-topic, then, is silly.
You're right.

I wasn't clear in my intentions, and I apologise to all for that.

Calibre doesn't do what I want. I know this is true, because I've used it and because of the discussion here.

I want a tool to do what I want. I also know this is true ;-)

Given those two statements, I'd like this thread to discuss any features other people would like to see in a tool of this sort.

I think that because this tool is more specialised than Calibre, it can co-exist peacefully. And, again, I want to say that I *like* Calibre. I like it converting stuff for me, and I plan to continue using it. I don't want anyone to think that I'm trying to replace because really, truly, I've no intention of doing that.

I understand the urge to write software, being a coder myself, but have you considered offering your SQL skills to Kovid?
I thought about it; I concluded that modifying someone else's code, especially when it's known to be sub-optimal, and trying to modify their architecture, and to introduce features that I consider to be important, just isn't feasible in this instance. If I write the tool myself, I can proceed at my own pace and prioritise things that I think are important.

Frankly, I don't care where the books are stored, as long as I can back them up. [...] If the best way to do that is to copy all your books to a central location, that's fine with me. I think Kovid's question, to paraphrase, "why do you care about folders?" is cogent. If the software lets you interact with the books, who cares what folder organization is used?
I think this is one of my fundamental points. I believe that some people *do* have a strong preference for the manner of their organisation. I have two separate methods myself; I know that some people prefer to use the Dewey Decimal System. We've heard how other people store their books in this thread. The reason we store them like this is frequently just because this makes sense to us, but sometimes, it's like an OCD ability; it *must* be put there, because that's what it belongs :-p

More than that, I believe users should have the ability to *choose* how they want it done. If they don't want to move things at all, great. If they want it organised by first letter of the third word, they should be allowed to.

I want to find particular titles, sort my collection various ways, have a nice search mechanism, a color scheme or tagging scheme I can customize (original format, original source, on the Reader or not, already read, paper version only, etc.).
Now that's exactly the kind of thing I want.

I'll be very interested in what you come up with, but I'm equally interested in existing tools.
I totally agree; the only reason for doing this, in the end, is to have more choice over how we want things to be done.

NetSlut
09-08-2008, 01:59 PM
Try not to be rude, it gives people a higher opinion of your intelligence.
Steal our girlfriends and there might be some muttering going on; but talk about our code and there's hell to pay!

:)

I'm sorry, Kovid, I genuinely don't mean to be rude about you or Calibre. Long may you continue to provide us all with such a useful program!

NetSlut
09-08-2008, 02:01 PM
(even the scary one). :)
It's the lampshade that does it, right?

I knew I should have gone for the pastel green one...

kovidgoyal
09-08-2008, 02:02 PM
Steal our girlfriends and there might be some muttering going on; but talk about our code and there's hell to pay!

:)

I'm sorry, Kovid, I genuinely don't mean to be rude about you or Calibre. Long may you continue to provide us all with such a useful program!

Cool, and best of luck with your work...

zelda_pinwheel
09-08-2008, 02:04 PM
aw, it just warms the cockles of me heart to see the two of you kiss and make up. that's the spirit ! bravo.

very well explained, netslut, i think your intentions are quite clear now and i'm sure everyone can understand.

and kovid, thanks too for not getting too overheated and calming down in time ! it's not always easy.

netslut, i can't wait to see what you come up with, and kovid, despite the glaring absence of a sony liseuse in my life i'm *still* going to take a look at calibre because it sounds like it has some features which could be really useful to me nonetheless.

Taylor514ce
09-08-2008, 02:18 PM
And don't think I don't see you sneaking in "liseuse", Zelda. I think we're calling them "Freedom Books" over here.

Sorry... let's now resume the actual discussion. I like the idea of "Views" or "Filters". Show me all books I haven't read. Now show me all books I've read and want to review. Show me all books currently on my Freedom Reader (heh). Show me all books I bought from Sony and haven't read.

The ability to create one's own categories, in a simple and straightforward manner, then, would be key.

zelda_pinwheel
09-08-2008, 02:29 PM
And don't think I don't see you sneaking in "liseuse", Zelda. I think we're calling them "Freedom Books" over here.
pffff !!! :p espèce d'énergumène !! don't you have an email you should be writing ?

Sorry... let's now resume the actual discussion. I like the idea of "Views" or "Filters". Show me all books I haven't read. Now show me all books I've read and want to review. Show me all books currently on my Freedom Reader (heh). Show me all books I bought from Sony and haven't read.

The ability to create one's own categories, in a simple and straightforward manner, then, would be key.
those are also features which would be really useful to me. netslut, are you writing this down ?

Taylor514ce
09-08-2008, 03:16 PM
Something like this, where the user could maintain the "Categories" and a many-to-many relationship, so that a book can fit as many categories as the user sees fit.

Same with Formats, so that a book can be listed under multiple formats.

I see "Genre" as something the user builds, since Publisher genres are either too lose or too specific. "SF" doesn't cut it for me.

pilotbob
09-08-2008, 03:29 PM
Sorry... let's now resume the actual discussion. I like the idea of "Views" or "Filters". Show me all books I haven't read. Now show me all books I've read and want to review. Show me all books currently on my Freedom Reader (heh). Show me all books I bought from Sony and haven't read.


For me tags is the best way to do this. Have you ever use delicious? You can tag each link with as many tags as you want. You make up the tags you want to use. Then, along the right side of your links is a list of all your tags. If you click on one tag then you only see the links with that tag. In addition to that another tag list pops up, this is all the other tags that the current links contain. So, you can click on a second tag to filter further.

In your case you can have a tags like "read", "toreview", "ondevice". Of course, some tags could be auto created by the software. Like "ondevice" would be a good example.

To me this is the most flexible way of indexing. Folder are "nice" but many times something fits in more than one folder... or the folder taxonomy is specific to something... say "genre" but you also want to see all the sf books you need to review... the folders fall down. But, with tags you just choose tags sf and toreview and viola.

Of, btw, delicious doesn't have it, but you should be able to negate a tag too. For example, I want to see everything withOUT a read tag to know what I haven't read yet rather than forcing me to create a read and a toread tag.

BOb

nekokami
09-08-2008, 03:40 PM
Tags are good. And they're also what Librarything uses. Which reminds me that I'd like to be able to sync my list of books with Librarything. :)

Taylor514ce
09-08-2008, 03:42 PM
TAGs, whatever, have to be stored in a database. I'm suggesting a database structure.

NetSlut
09-08-2008, 03:49 PM
For me tags is the best way to do this.
Ah, give the man a prize; that's precisely what I was thinking.

I'm currently launching a messaging service using avatars in my day job, and we've been implementing tags as a means for searching for avatars. It's fast, intuitive, and as long as you're allowing people to make their own and assign them as they will, they're devastatingly useful.

Reading the category/searching requirements, it seems to me that tags are definitely a big win.

I also like categorising groups of tags, so that they can be arranged in a column view, one tag category per column. Filter row results based on combinations of positive and negative occurrences of a set of tags, and virtually everything we've been talking about here can be done automatically.

nekokami
09-08-2008, 03:50 PM
I think tags are usually just concatenated in a single database field, freely, and selected on the fly. But I could be mistaken-- perhaps the software parses them and stores them in a separate table, as you suggest.

The main thing is to be able to have the many-to-many relationship, as you pointed out.

NetSlut
09-08-2008, 03:51 PM
netslut, are you writing this down ?
Absolument!

I'm getting a good solid set of features here.

Taylor514ce
09-08-2008, 03:55 PM
http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html

I guess I sketched out the "Toxi" solution referred to in the article. To me, it's basic database normalization. Regardless of the user interface (tag clouds), in the end you want a highly normalized database.

pilotbob
09-08-2008, 03:55 PM
I think tags are usually just concatenated in a single database field, freely, and selected on the fly. But I could be mistaken-- perhaps the software parses them and stores them in a separate table, as you suggest.

Yes and no. They are usually presented to the user in 1 field. However, to be able to do quick sorting on them they would be broken up into the database with a tags table and a tagsbooks table. The tag table will will include each tag once, then the tagsbooks will link the tag table to the books table.

If you just put them all in one field you have to scan through each record to pull out tags, etc. While indexed records can be retrieved so much faster.

BOb

NetSlut
09-08-2008, 03:57 PM
I think tags are usually just concatenated in a single database field, freely, and selected on the fly. But I could be mistaken-- perhaps the software parses them and stores them in a separate table, as you suggest.
Slow systems will combine them in a text field; but if you want any kind of decent response, you have to have individual indexed entries.

This brings up a really good question, btw; what is the general view regarding having a database running to store this metadata, versus having a custom file format, emulating a database?

Advantages to the database are speed and ease of development, no proprietary files to decode in case of trouble. But the custom files mean no separate program to run or configure.

Which would you all choose?

Kovid, I recall you mentioned moving from a database to a file; can I ask what your reasoning was behind that, and how you feel it's worked out for you?

Taylor514ce
09-08-2008, 04:00 PM
Right: Books, Cat (Tags), and the cross-reference table between them. Just like I sketched above. Grrr.

I've always been a corporate developer, so the question of "use a database or not" never comes up. We always use a database. For an end-user program, I see the dilemma.

nekokami
09-08-2008, 04:01 PM
If you just put them all in one field you have to scan through each record to pull out tags, etc. While indexed records can be retrieved so much faster.
Yes, of course. And the main system I administer that actually uses freetags (Drupal) does create a table, and CafePress does as well (I can tell, because sometimes my capitalization changes if someone has used the same tag before me with different capitalization). I suspect some of the older systems I've used really did just dump the tags into one field, but it would of course be much slower to search that way.

I remember trying to figure out how Otakuworld had done their "similar titles" search at one point and coming up with this convoluted solution involving bitwise comparisons (because they seemed to have a fixed number of tags). In retrospect, I'm sure they had something much simpler going on than that. :rolleyes:

kovidgoyal
09-08-2008, 04:02 PM
Keep in mind that while user specified tags are great, you also need automatically specified tags from book metadata. This is to cater to users that don't want to go to the trouble of manually tagging every book (title, author,series,isbn etc are examples of "automatic tags"). And in order to do that, you're going to have to write plugins to read metadata from all the ebook formats you want your app to support.

nekokami
09-08-2008, 04:06 PM
Hm... how about a "statistically improbable phrase" plugin? ;)

NetSlut
09-08-2008, 04:10 PM
Keep in mind that while user specified tags are great, you also need automatically specified tags from book metadata. This is to cater to users that don't want to go to the trouble of manually tagging every book (title, author,series,isbn etc are examples of "automatic tags"). And in order to do that, you're going to have to write plugins to read metadata from all the ebook formats you want your app to support.
One of the things that I think this approach would be useful for is in specifying the directory hierarchy of one's file system.

If you specify that your books are stored in directory structure platform/author/series/isbn/title, then I could create tag categories for each of those, and automatically populate the specific tags from the specific directories.

You're of course absolutely right that I'll need to read metadata from the ebooks themselves in order to more accurately populate certain categories (specifically title, I think, based on what I've seen so far), and of course for those ebooks that aren't in directory hierarchies at all.

I was thinking that maybe I should write the metadata readers as separate entities, and put those in the open source community directly; so then we might get to a point where they can be maintained and added to by many people for many different programs, as long as the interface is consistent.

kovidgoyal
09-08-2008, 04:41 PM
Kovid, I recall you mentioned moving from a database to a file; can I ask what your reasoning was behind that, and how you feel it's worked out for you?


Originally, calibre stored both book metadata and the ebook files themselves in an sqlite database. This was so I wouldn't have to worry about the various file system quirks of 3 OSes. It has now moved to storing only the metadata in the database and the book files on the file system itself. This was done primarily to allow easy sharing of the calibre database between different computers over a network (the sqlite database can now be kept in memory for faster reads).

IMO, asking your users to run a separate database process is too much. Look into using some file based database engine like sqlite or metakit. Also if you use a database abstraction layer then users can choose any database backend to suit their needs.

kovidgoyal
09-08-2008, 04:44 PM
I was thinking that maybe I should write the metadata readers as separate entities, and put those in the open source community directly; so then we might get to a point where they can be maintained and added to by many people for many different programs, as long as the interface is consistent.

Note that calibre already has metadata readers/writers available as command line tools that should not bee to hard to interface with. You also need to give users a way of specifying how your program should interpret metadata from file names. This is because, contrary to the impression from this thread, most people actually dont use folder structures at all and just put metadata into structured file names.

Slite
09-08-2008, 04:46 PM
One of the things that I think this approach would be useful for is in specifying the directory hierarchy of one's file system.

If you specify that your books are stored in directory structure platform/author/series/isbn/title, then I could create tag categories for each of those, and automatically populate the specific tags from the specific directories.

You're of course absolutely right that I'll need to read metadata from the ebooks themselves in order to more accurately populate certain categories (specifically title, I think, based on what I've seen so far), and of course for those ebooks that aren't in directory hierarchies at all.

I was thinking that maybe I should write the metadata readers as separate entities, and put those in the open source community directly; so then we might get to a point where they can be maintained and added to by many people for many different programs, as long as the interface is consistent.

The directory structure needs to be dynamic tho...

I tend to organize my books in /library/A/Last names begining with A, FirstName/FORMAT and so on and so forth....

NetSlut
09-08-2008, 05:39 PM
It has now moved to storing only the metadata in the database and the book files on the file system itself.
Yeah, that makes sense.

IMO, asking your users to run a separate database process is too much. Look into using some file based database engine like sqlite or metakit.
That's what I'm feeling, although I don't like the performance of the file based ones. I might write something more tailored if the vote here is against a real database.

Also if you use a database abstraction layer then users can choose any database backend to suit their needs.
Absolutely.

NetSlut
09-08-2008, 05:45 PM
Note that calibre already has metadata readers/writers available as command line tools that should not bee to hard to interface with.
Oh, that's fantastic. I'll definitely be using those, if you're ok with that?
Are they available as classes/libraries at all? That would be faster at run-time.

You also need to give users a way of specifying how your program should interpret metadata from file names. This is because, contrary to the impression from this thread, most people actually dont use folder structures at all and just put metadata into structured file names.
Yes, that's a very good point too. That *will* be complicated. I don't like having to get users to type in regex expressions or funny syntax, so that'll take some real effort to come up with a suitable GUI.

NetSlut
09-08-2008, 05:47 PM
The directory structure needs to be dynamic tho...

I tend to organize my books in /library/A/Last names begining with A, FirstName/FORMAT and so on and so forth....
Definitely. I'm anticipating users initiating an indexing operation by specifying a range of locations to examine, and then specifying the directory/filename structure. It should persist only for that operation (but be retrievable, of course).

kovidgoyal
09-08-2008, 05:54 PM
Oh, that's fantastic. I'll definitely be using those, if you're ok with that?
Are they available as classes/libraries at all? That would be faster at run-time.


Yes, that's a very good point too. That *will* be complicated. I don't like having to get users to type in regex expressions or funny syntax, so that'll take some real effort to come up with a suitable GUI.

They're (mostly) python code, so unless you write your app in python or embed a python interpreter, you'll have to use them as commandline utilities.

Great if you come up with a good GUI, maybe I'll steal it from you (at the moment calibre requires users to enter regexps).

NetSlut
09-11-2008, 10:43 PM
Thanks for all the feedback, fellows; I'm now designing away and figuring how to build this. I'll create a different thread when I have something to show you all, but I'll post a notice in here.

Slite
09-13-2008, 05:08 PM
Ohhh! Just figured out a function I'd really like to see. Have the program automaticly tell me which parts of a series I am missing if I ask it.

Pwett pwease? With a cherry on top...

NetSlut
09-13-2008, 05:19 PM
That would be tricky. It's difficult to guarantee that I could automatically retrieve all parts of a series, so the only way I would be able to tell is if you entered the missing parts in the first place...

pilotbob
09-14-2008, 12:38 AM
Ohhh! Just figured out a function I'd really like to see. Have the program automaticly tell me which parts of a series I am missing if I ask it.

Pwett pwease? With a cherry on top...

FYI: That's a nice feature of LibraryThing and one of the main reasons I joined up. I wanted to know what Trek books I was missing from all the series they have it's hard to keep track.

Although, as NetS says, there really isn't any place to get this data. At LibraryThing it is all member supplied data. Although it looks like http://www.fantasticfiction.co.uk/ does have this type of info. Not sure if they have an API though.

BOb

NetSlut
09-14-2008, 09:27 AM
Although, as NetS says, there really isn't any place to get this data. At LibraryThing it is all member supplied data.
Yeah. I could maybe track some things down if I used a combination of APIs (from isbndb) and page scraping (searches from amazon, librarything, etc.) but it just wouldn't be very reliable.

I've just finally managed to hunt down the full list of The Destroyer series, for example, since no book shop has even the list, let alone the books.

Slite
09-15-2008, 02:22 AM
Yeah. I could maybe track some things down if I used a combination of APIs (from isbndb) and page scraping (searches from amazon, librarything, etc.) but it just wouldn't be very reliable.

I've just finally managed to hunt down the full list of The Destroyer series, for example, since no book shop has even the list, let alone the books.

To bad, that would have been a truly awesome function.... Hmmm, maybe a way to add that manually? Or at least a way to add information about a series to the database?

Lets say that the program finds Eddings, David - Pawn of Profecy on my hd, that the software lets me enter Belgariad 01 as info for the book in question, that way one could at least sort on series and possibly have a function that lets me add a "wishlist"?

NetSlut
09-15-2008, 06:17 AM
Hmmm, maybe a way to add that manually? Or at least a way to add information about a series to the database?
Oh, that's absolutely in there.
Some people wanted to enter paper copies, for example, so what I figure is if you can add new books without requiring a file, then you can also tag them with "own" or not, and since "series" and "volume" tags are already present, that would combine nicely and automatically to do at least part of what you want.

The program is currently indexing books according to a series of meta-data reader plugins, using a user-specified directory hierarchy, and all based on tags. I'm starting the search facilities now.
The big thing it's missing amongst what it does do, is the GUI.

Won't be too long before I can get an alpha out, hopefully.

Slite
09-15-2008, 06:28 AM
Oh, that's absolutely in there.
Some people wanted to enter paper copies, for example, so what I figure is if you can add new books without requiring a file, then you can also tag them with "own" or not, and since "series" and "volume" tags are already present, that would combine nicely and automatically to do at least part of what you want.

The program is currently indexing books according to a series of meta-data reader plugins, using a user-specified directory hierarchy, and all based on tags. I'm starting the search facilities now.
The big thing it's missing amongst what it does do, is the GUI.

Won't be too long before I can get an alpha out, hopefully.

SAWEET! I'll definitly look forward to take this baby for a testrun!

NetSlut
02-08-2009, 11:35 PM
Harvey Beta release imminent.

Fellows, after an interesting six months (oh good grief, is it that long already?), I've finally got to a point where I'm comfortable making a Beta release of Harvey, my ebook (and ok, anything else) indexing tool.

There are two big things it doesn't do, and one thing it only sort of does.

First, the organise files function isn't there yet, to take your files and copy or move them into a directory structure as it indexes them. I don't know many people that wanted this, but those that did, sorry, that'll come soon, I promise.

Second, the meta-data readers. I haven't been able to figure out how to run Calibre's meta-data readers from the command line yet, and I've been concentrating on writing the main functionality first, so there's only one reader at present that understands any ebook format meta-data. It'll still index a bunch of files, it just won't be able to get any meta-data about the file that isn't available from the file name and directory structure.

Don't worry, all will become clear.

In addition to this, the define search stuff is pretty weak, it assumes you want to "AND" every condition.

Apart from those issues, it's looking very useful in my opinion. For those of you with GUI design skills, I'd ask you to bear with me while I put something prettier in place: it's very bare bones functional-only right now.

Still, I think it looks sensible enough to let everyone see it.

I'm currently packaging up a distribution: writing a tutorial that'll hopefully make sense of it all. Once that's done (a matter of a day or two), I'll post it here for you all to play with as you want.

One question I'd love to have answered though, is how many of you would be comfortable installing a database yourselves? I have two versions of Harvey, one for people who are happy installing a database (like mySQL) themselves, and one with the database embedded, so you don't have to do anything. The embedded version uses MySQL, and is about 6MB larger, so I wondered if any/many of you would want that instead.

See you all in a day or two!

zelda_pinwheel
02-11-2009, 09:10 AM
i'd rather have a version with a database embedded. :) (exciting times !)