MobileRead Forums - View Single Post

chaley · 04-27-2011, 05:13 PM

Quote:

Originally Posted by kiwidude

Right, so there is a bit of a flaw it seems in the approach to storing the exemptions. It needs some kind of optimisation (or else a threshold limit above which users are not allowed to exempt a group).

At the moment if you exempt a group it creates a dictionary entry for every book id in the exemption group, listing every other book in that group as it's exemption pair match. So if you exempt a group of (1,2,3), you end up with:
{ 1: (2,3)
2: (1,3)
3: (1,2) }

Now that works really well when the groups are small, it is simple and flexible to maintain.

However, what do you do when someone does a fuzzy search, comes back with hundreds if not thousands of titles in the group, and then does mark as exempt on it?

I can tell you what Calibre does - memory climbs very rapidly and then it crashes

So - any suggestions as to what we do? The easy way out would be to put an exemption group size limit. After all if you have hundreds or more books coming back in a group it should be a trivial thing to skip past that group.

The alternative I really hate to think about how much more complicated it will make the code. As the user is allowed to remove any one or more of those individual pairings.

Any thoughts?

The algorithm that I gave you is perfectly happy to use exemption groups like (1,3,4,5,6,7). It merges groups as they are built. As such, what happens if you store exemptions as the user provides them? If the user selects a bunch of books to exempt, store that bunch of books as an exemption.

The size problem would still arise when building the exemption map. If the user selects all the books in a 40,000 book library and then says that they are exempt, things get dicey. The maximum size for a 40,000 book library would be 40,000*40,000, or 1,600,000,000 integers, which is too large to be practical. You can avoid this sort of silliness by limiting the size of any exemption group to something like 1,000 books. Even that number is silly-large. Perhaps 100 is sufficient.

Another possibility is to allow exemption processing only when using the least conservative algorithms. The groups would by definition be smaller.

You might consider storing the exemption map instead of pairs. In this case, removing exemptions would be done pair-wise. If I have (10,15), I would remove 15 from 10 and 10 from 15. If the user asks to remove exemptions for (10, 11, 12), then it would be 10-(11, 12), 11-(10, 12), 12-(10, 11).

Random thoughts.