Liz Heym Catching Waves With Time-Series Data, SF Bay Area Ruby Meetup July 18 2024

Catching Waves With Time-Series Data Liz Heym Cisco Meraki

We’ll cover: - How to select a tool for managing
time-series data - How to organize, query, and aggregate time-series data - How to translate your design to API constraints

But first! What is time-series data? Time-series data is a
collection of observations recorded over consistent intervals of time.

A surfer’s goal Liz has just taken her first surf
lesson, and she’s keen on learning how her surfing will improve over time. She’s decided to record this data in a time-series database and to access it via an API endpoint. But where does she start?

Selecting the right board for the conditions 1 Surf a
board you already have Use a time-series DB already in your tech stack 2 Use the old board, but add a new set of fins Use an extension for a DB you already use 3 Buy a new board Adopt a new DB technology 4 Shape your own board Design your own DB

1. Surf a board you already have If you already
have a database that’s well-suited for time-series data, why change? Maybe you just need to adjust your techniques!

2. Keep the old board, but add a new set
of fins • Old board = Postgres • New fins = Postgres extension • A few options: pg_timeseries or TimescaleDB

3. Buy a new board Sometimes, your existing tools don’t
cut it, and you need to invest in something entirely new. ClickHouse is a fast, open-source analytical database, designed around time-series data.

4. Shape your own board Sometimes, no available database seems
suited to your highly specific needs. In 2008, the engineers Meraki found themselves in this position, and LittleTable was born.

4. Shape your own board: LittleTable • Relational database •
Optimized for time-series data • Data clustered for continuous disk access • SQL interface for querying LT White Paper

The Perfect Technique • Now that you have a board,
you need to learn how to surf it! • Much like in surfing, there are tried-and-true techniques for best handling time-series data. • We’ll cover: 1. Data arranged by time 2. Hierarchically-delineated key 3. Querying by index 4. Aggregation and Compression

1. Data arranged by time • Key feature of a
time-series DB • ClickHouse automatically generates an index on the ts column • Performant when accessing a range of time • LittleTable is append-only

2. Hierarchically-delineated key • In addition to being grouped by
time, data is organized according to this composite key. • Crucial to understand how this data is going to be accessed—not every query will be efficient

2. Hierarchically-delineated key In this example, the composite key is:
Network Id, Device Id

2. Hierarchically-delineated key • Organize by increasing specificity • Cisco
Meraki’s example from the previous slide: Network, Device • For Liz’s surfing application: Surfer, Region, Break

3. Querying by index: LittleTable • LittleTable is organized across
two axes: composite key and time ◦ Only need a prefix • Performant query for LittleTable: ◦ Surfer ◦ Region, ◦ Timestamp

3. Querying by index: ClickHouse • ClickHouse include timestamp at
the end of the composite index ◦ So you must query with the full key • Non-performant query ◦ Surfer, Timestamp • Performant query ◦ Surfer, Region, Break, Timestamp Liz, LA, Malibu, over the past month Liz, Humboldt, Moonstone, over the past month Two weeks … Two weeks

4. Aggregation and Compression • Time-series data can pile up
fast • Two needs: ◦ Don’t have infinite storage ◦ Also want to show as much data as possible

4. Aggregation and Compression • Don’t have infinite storage ◦
Data retention ◦ Time-to-live • Also want to show as much data as possible ◦ Compression ◦ Aggregation

4. Compression: TimescaleDB

4. Aggregation: LittleTable • Base table and aggregate table •
Base table (data per wave): ◦ Distance, Duration • Aggregate table (data per interval of time): ◦ Total distance, total duration, max speed, wave count

4. Aggregation: LittleTable • We can aggregate the data over
the following intervals: ◦ Base table—with a TTL of 1 month ◦ One day—with a TTL of 6 months ◦ One week—with a TTL of 1 year ◦ One month—with a TTL of 5 years

Getting out there We have our data: • Stored •
Aggregated • Easily accessible Now we design an API endpoint that Liz can use to easily query her surf data.

Getting out there: Query params • Required ◦ Surfer ◦
Timespan • Optional ◦ Region ◦ Break

Getting out there: Timespan and interval • timespan = the
full period of time over which we want data. ◦ Our longest TTL is 5 years: that’s the max timespan • interval = the grain at which the data is aggregated ◦ Calculated based on the timespan • The interval options are: ◦ One day (TTL 6 months) ◦ One week (TTL 1 year) ◦ One month (TTL 5 years)

Getting out there: Visualization

A surfer’s success Get Stoked!

Thank you! Would love to chat afterwards :) Liz Heym
Cisco Meraki

Liz Heym Catching Waves With Time-Series Data, ...

Liz Heym Catching Waves With Time-Series Data, SF Bay Area Ruby Meetup July 18 2024

Irina Nazarova

More Decks by Irina Nazarova

Featured

Transcript

Catching Waves With Time-Series Data Liz Heym Cisco Meraki

We’ll cover: - How to select a tool for managing

But first! What is time-series data? Time-series data is a

A surfer’s goal Liz has just taken her first surf

Selecting the right board for the conditions 1 Surf a

1. Surf a board you already have If you already

2. Keep the old board, but add a new set

3. Buy a new board Sometimes, your existing tools don’t

4. Shape your own board Sometimes, no available database seems

4. Shape your own board: LittleTable • Relational database •

The Perfect Technique • Now that you have a board,

1. Data arranged by time • Key feature of a

2. Hierarchically-delineated key • In addition to being grouped by

2. Hierarchically-delineated key In this example, the composite key is:

2. Hierarchically-delineated key • Organize by increasing specificity • Cisco

3. Querying by index: LittleTable • LittleTable is organized across

3. Querying by index: ClickHouse • ClickHouse include timestamp at

4. Aggregation and Compression • Time-series data can pile up

4. Aggregation and Compression • Don’t have infinite storage ◦

4. Compression: TimescaleDB

4. Aggregation: LittleTable • Base table and aggregate table •

4. Aggregation: LittleTable • We can aggregate the data over

Getting out there We have our data: • Stored •

Getting out there: Query params • Required ◦ Surfer ◦

Getting out there: Timespan and interval • timespan = the

Getting out there: Visualization

A surfer’s success Get Stoked!

Thank you! Would love to chat afterwards :) Liz Heym