(with n distinct elements) S = x1 x2 x3 · · · x a straight sample [Vitter 85..] of size m (each xi taken with prob. ≈ m/ ) a x x x x b b x c d d d b h x x ... allows us to deduce ‘a’ repeated ≈ /m times in S, but impossible to say anything about rare elements, hidden in the mass = problem of needle in haystack a distinct sample (with counters) (a, 9) (x, 134) (b, 25) (c, 12) (d, 30) (g, 1) (h, 11) ( takes each element with probability 1/n = independently from its frequency of appearing Textbook example: sample 1 element of stream (1, 1, 1, 1, 2, 1, 1, . . . , 1), = 1000; with straight sampling, prob. 999/1000 of taking 1 and 1/1000 of taking 2; with distinct sampling, prob. 1/2 of taking 1 and 1/2 of taking 2. 22/22