You don't need to compound this with RFIDs, iPods and downloads... just combine it with highly directional speakers, the kind museums are using now, and only the person(s) within touching range would hear a particular image. Then do a bit of spacing them apart.
|