Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 03-15-2012, 11:29 AM   #31
jorm
Member
jorm began at the beginning.
 
Posts: 17
Karma: 10
Join Date: Mar 2012
Device: nook
the idea is not bad. The only catch is how to reliably identify the same 3 words in all formats of the book. Is it fair to have a small dictionary of words that we use? I think this would be an interesting project and would be fun to see how large and accurate a database we could build.

It would also save people potentially thousands of man hours of organization if we could get this widely into use.

The key is even if we don't get a 1-1 match for all editions it should not matter eventually you will have fingerprints for the other editions and those would be tagged as well.

I am still looking for volunteers.
jorm is offline   Reply With Quote
Old 03-15-2012, 12:15 PM   #32
Fanas
Member
Fanas began at the beginning.
 
Posts: 21
Karma: 12
Join Date: Aug 2009
Device: none
Small dictionary of words is the way to go, there are words in any language that are used often enough to appear many times in any book, obviously 1-3 letter words would appear to often so those would be off limits. But any 4-5 letter long words would be both abundant and yet not so much as to make hashing way too resource consuming. Though someone else with more know-how should decide on algorithms to be used for identifying. I say, take the hash and scan through book until you find 2 consecutive occurrences of the word having same amount of letters in-between. When found you see if the next occurrence is correct as well, then you repeat until there's mismatch. After you finish scanning the book you've got more or less the right idea of how identical the two texts are.
Fanas is offline   Reply With Quote
Advert
Old 03-15-2012, 06:40 PM   #33
transmitthis
Addict
transmitthis ought to be getting tired of karma fortunes by now.transmitthis ought to be getting tired of karma fortunes by now.transmitthis ought to be getting tired of karma fortunes by now.transmitthis ought to be getting tired of karma fortunes by now.transmitthis ought to be getting tired of karma fortunes by now.transmitthis ought to be getting tired of karma fortunes by now.transmitthis ought to be getting tired of karma fortunes by now.transmitthis ought to be getting tired of karma fortunes by now.transmitthis ought to be getting tired of karma fortunes by now.transmitthis ought to be getting tired of karma fortunes by now.transmitthis ought to be getting tired of karma fortunes by now.
 
transmitthis's Avatar
 
Posts: 288
Karma: 1003542
Join Date: May 2011
Device: Google Nexus 7 16GB
Took me 2sec to identify
The Metamorphosis of Prime Intellect
A Novel by Roger Williams

Shame you can't use google - wonder what the state of AI is these days?
transmitthis is offline   Reply With Quote
Old 03-15-2012, 11:29 PM   #34
Benji99
Member
Benji99 began at the beginning.
 
Posts: 12
Karma: 10
Join Date: May 2008
Device: None
Just want to my 2 cents to this discussion...

I spend a lot of time manually editing my book collection like I'm sure most of you guys do and no one else benefits but me.

I've also been thinking of how great a tagging service for books would be.
Instead of thinking of FreeDB, I was thinking of something closer to Musicbrainz of which I love for music. Ideally, the best would be get the musicbrainz developpers interested in this project as a book database has a lot in common with a music database.
Benji99 is offline   Reply With Quote
Old 03-16-2012, 02:05 AM   #35
Fanas
Member
Fanas began at the beginning.
 
Posts: 21
Karma: 12
Join Date: Aug 2009
Device: none
Quote:
Originally Posted by transmitthis View Post
Took me 2sec to identify
The Metamorphosis of Prime Intellect
A Novel by Roger Williams

Shame you can't use google - wonder what the state of AI is these days?
Well you are human, you are good at this sort of thing and it's a very good novel.
Fanas is offline   Reply With Quote
Advert
Old 03-16-2012, 05:00 PM   #36
Backi
Connoisseur
Backi has become a pillar of the MobileRead communityBacki has become a pillar of the MobileRead communityBacki has become a pillar of the MobileRead communityBacki has become a pillar of the MobileRead communityBacki has become a pillar of the MobileRead communityBacki has become a pillar of the MobileRead communityBacki has become a pillar of the MobileRead communityBacki has become a pillar of the MobileRead communityBacki has become a pillar of the MobileRead communityBacki has become a pillar of the MobileRead communityBacki has become a pillar of the MobileRead community
 
Backi's Avatar
 
Posts: 99
Karma: 15776
Join Date: Dec 2011
Device: PB912 Matt White
I took a look at the plug-ins for calibre and found this one:

[GUI Plugin] Find Similar Stories

I think, Ian_Stott did some preparatory work.
Backi is offline   Reply With Quote
Old 03-19-2012, 12:27 PM   #37
transmitthis
Addict
transmitthis ought to be getting tired of karma fortunes by now.transmitthis ought to be getting tired of karma fortunes by now.transmitthis ought to be getting tired of karma fortunes by now.transmitthis ought to be getting tired of karma fortunes by now.transmitthis ought to be getting tired of karma fortunes by now.transmitthis ought to be getting tired of karma fortunes by now.transmitthis ought to be getting tired of karma fortunes by now.transmitthis ought to be getting tired of karma fortunes by now.transmitthis ought to be getting tired of karma fortunes by now.transmitthis ought to be getting tired of karma fortunes by now.transmitthis ought to be getting tired of karma fortunes by now.
 
transmitthis's Avatar
 
Posts: 288
Karma: 1003542
Join Date: May 2011
Device: Google Nexus 7 16GB
Quote:
Originally Posted by Fanas View Post
Well you are human, you are good at this sort of thing and it's a very good novel.
Will have to put it in my "toread" list.

The point I was making though, is it took a simple google search on some random text, and then another search on a link to positivity identify the book - so if the state of AI has improved since last I looked, maybe there are some options to explore in that area?
Probably too expensive - processor,monitory, and time but things are changing pretty quick in tech.
transmitthis is offline   Reply With Quote
Old 03-20-2012, 03:37 PM   #38
Fanas
Member
Fanas began at the beginning.
 
Posts: 21
Karma: 12
Join Date: Aug 2009
Device: none
Quote:
Originally Posted by transmitthis View Post
Will have to put it in my "toread" list.

The point I was making though, is it took a simple google search on some random text, and then another search on a link to positivity identify the book - so if the state of AI has improved since last I looked, maybe there are some options to explore in that area?
Probably too expensive - processor,monitory, and time but things are changing pretty quick in tech.
Yes, because the full text of novel is available free on the internet. With other books it might be different, you can't exactly scan torrents and rars of pirated books.
Fanas is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
An idea about technical and reparing service paula-t enTourage eDGe 8 06-19-2011 06:55 PM
Ebook Idea - An Amazing Coincidence! Diso General Discussions 21 09-14-2010 12:52 PM
Idea for a $50 ebook reader ashkulz News 5 04-08-2007 11:08 AM
Site maintenance - first phase complete Alexander Turcic Announcements 1 12-06-2004 11:39 AM


All times are GMT -4. The time now is 12:40 PM.


MobileRead.com is a privately owned, operated and funded community.