View Single Post
Old 03-14-2012, 09:51 AM   #29
jorm
Member
jorm began at the beginning.
 
Posts: 17
Karma: 10
Join Date: Mar 2012
Device: nook
I think the key is we use 3-4 options at once if one fails we fall through. We capture metrics different ways and populate a database.

We try to find the first chapter by looking for a lot of sentences of certain length with punctuation. We strip off the white space and punctuations.

We try to find the last chapter by looking for a lot of sentences of certain length with punctuation. We strip off the white space and punctuations.

Take the proper noun set from the first 20 pages and use a bit of fuzzy logic.


The idea is that we build a system that does all of these captures the metrics for all of them in the case of an add.

When someone searches we will try all of these till we get a success.

As for playing part of the book that would be interesting. Interesting, but might not be fast enough.
jorm is offline   Reply With Quote