Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Francesc Alted

Francesc Alted

bcolz, a data container that can beat memory speed by using extremely fast data compression

FrancescAlted

July 24, 2015
Tweet

More Decks by FrancescAlted

Other Decks in Technology

Transcript

  1. What is bcolz? • Provides a storage layer that is

    both chunked and is compressible • It is meant for both memory and persistent storage (disk) • Main goal: to demonstrate that compression can accelerate data access (both on disk and in-memory)
  2. Why Compression (II)? Less data needs to be transmitted to

    the CPU Disk or Memory Bus Decompression Disk or Memory (RAM) CPU Cache Original
 Dataset Compressed
 Dataset Transmission + decompression faster than direct transfer?
  3. Query Times
 3 years-old laptop (Ivy Bridge) • Compression leads

    to better performance, even for in-memory bcolz data containers
  4. Query Times
 5-years old laptop (Core2) • Compression still makes

    things slower on old boxes, but not necessarily in newer ones
  5. Streaming Analytics With bcolz bcolz container (disk or memory) iter(),

    iterblocks(),
 where(), whereblocks(), __getitem__() map(), filter(), groupby(), sortby(), reduceby(),
 join() bcolz
 iterators/filters with blocking itertools, PyToolz, Dask