![]() |
#31 |
Member
![]() Posts: 17
Karma: 10
Join Date: Mar 2012
Device: nook
|
the idea is not bad. The only catch is how to reliably identify the same 3 words in all formats of the book. Is it fair to have a small dictionary of words that we use? I think this would be an interesting project and would be fun to see how large and accurate a database we could build.
It would also save people potentially thousands of man hours of organization if we could get this widely into use. The key is even if we don't get a 1-1 match for all editions it should not matter eventually you will have fingerprints for the other editions and those would be tagged as well. I am still looking for volunteers. |
![]() |
![]() |
![]() |
#32 |
Member
![]() Posts: 21
Karma: 12
Join Date: Aug 2009
Device: none
|
Small dictionary of words is the way to go, there are words in any language that are used often enough to appear many times in any book, obviously 1-3 letter words would appear to often so those would be off limits. But any 4-5 letter long words would be both abundant and yet not so much as to make hashing way too resource consuming. Though someone else with more know-how should decide on algorithms to be used for identifying. I say, take the hash and scan through book until you find 2 consecutive occurrences of the word having same amount of letters in-between. When found you see if the next occurrence is correct as well, then you repeat until there's mismatch. After you finish scanning the book you've got more or less the right idea of how identical the two texts are.
|
![]() |
![]() |
Advert | |
|
![]() |
#33 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 288
Karma: 1003542
Join Date: May 2011
Device: Google Nexus 7 16GB
|
Took me 2sec to identify
![]() The Metamorphosis of Prime Intellect A Novel by Roger Williams Shame you can't use google - wonder what the state of AI is these days? |
![]() |
![]() |
![]() |
#34 |
Member
![]() Posts: 12
Karma: 10
Join Date: May 2008
Device: None
|
Just want to my 2 cents to this discussion...
I spend a lot of time manually editing my book collection like I'm sure most of you guys do and no one else benefits but me. I've also been thinking of how great a tagging service for books would be. Instead of thinking of FreeDB, I was thinking of something closer to Musicbrainz of which I love for music. Ideally, the best would be get the musicbrainz developpers interested in this project as a book database has a lot in common with a music database. |
![]() |
![]() |
![]() |
#35 |
Member
![]() Posts: 21
Karma: 12
Join Date: Aug 2009
Device: none
|
|
![]() |
![]() |
Advert | |
|
![]() |
#36 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 99
Karma: 15776
Join Date: Dec 2011
Device: PB912 Matt White
|
I took a look at the plug-ins for calibre and found this one:
[GUI Plugin] Find Similar Stories I think, Ian_Stott did some preparatory work. |
![]() |
![]() |
![]() |
#37 | |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 288
Karma: 1003542
Join Date: May 2011
Device: Google Nexus 7 16GB
|
Quote:
![]() The point I was making though, is it took a simple google search on some random text, and then another search on a link to positivity identify the book - so if the state of AI has improved since last I looked, maybe there are some options to explore in that area? Probably too expensive - processor,monitory, and time but things are changing pretty quick in tech. |
|
![]() |
![]() |
![]() |
#38 | |
Member
![]() Posts: 21
Karma: 12
Join Date: Aug 2009
Device: none
|
Quote:
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
An idea about technical and reparing service | paula-t | enTourage eDGe | 8 | 06-19-2011 06:55 PM |
Ebook Idea - An Amazing Coincidence! | Diso | General Discussions | 21 | 09-14-2010 12:52 PM |
Idea for a $50 ebook reader | ashkulz | News | 5 | 04-08-2007 11:08 AM |
Site maintenance - first phase complete | Alexander Turcic | Announcements | 1 | 12-06-2004 11:39 AM |