As the scale of data our systems produce continues to increase, the techniques our systems use to process it must evolve. Kiran Bhattaram explains why sketches are a good option for leveraging more sophisticated data structures.
Sketching data structures are probabilistic structures that store a summary of the full dataset. They’re specialized to answer specific questions (e.g., how many unique values a large dataset contains or what the p95 of the dataset is). By leveraging some neat mathematical properties, sketching data structures trades accuracy for a significant increase in both computational and storage efficiency.
Kiran covers real-world use cases of a few basic sketching data structures and explores the statistical underpinnings that make them work.