Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Waldemar Hummer - LocalStack Snowflake Emulator...

Waldemar Hummer - LocalStack Snowflake Emulator Intro

An intro to the newest emulator by LocalStack. Contains examples and a roadmap for the future of the Snowflake emulator.

Avatar for Anca Ghenade

Anca Ghenade

June 08, 2024
Tweet

More Decks by Anca Ghenade

Other Decks in Programming

Transcript

  1. localstack localstack-cloud localstack localstack.cloud 2 What is Snowflake? • A

    Cloud Data Platform that allows for scalable data processing ◦ Uploading data files (CSV/JSON/parquet) to stages ◦ Running SQL statements to create databases/tables/views/… ◦ Running SELECT queries to query data from files and tables ◦ Running scheduled jobs to create ETL pipelines ◦ … • Lots of native integrations and SDKs/tools to interact with the platform ◦ Python Pandas dataframes, Snowpark libraries, JDBC driver, … • Some similarities to the Data/BigData services in AWS: ◦ Athena, Redshift, EMR, Managed Airflow, etc
  2. localstack localstack-cloud localstack localstack.cloud Developing Data Pipelines Locally 3 •

    Developing for Snowflake requires connectivity to the remote cloud at all times ◦ → how does that fit into dev Lifecycle? ◦ → is there a local development story? • Often requested feature, even in Snowflake forums , as well as on StackOverflow, Reddit, etc.. • Similar challenges as for AWS cloud ◦ Speed up development cycles; avoid resource conflicts; test reproducibility; costs …
  3. localstack localstack-cloud localstack localstack.cloud 4 LocalStack for Snowflake • Available

    as a Docker image ◦ Can be easily installed locally • Emulates the actual Snowflake API Surface ◦ → integrates natively with all Tooling ◦ JDBC, DB visualization tools, etc work out of the box • Easy to extend from local into CI pipelines - running tests in CI • Recent Announcement: https://blog.localstack.cloud/2024-05-22-introducing-localstack-for-snowflake/
  4. localstack localstack-cloud localstack localstack.cloud 5 Supported Feature Set (Excerpt) •

    Some of the key features are already available today, including: ◦ Basic operations on warehouses, databases, schemas, and tables (e.g., Using the Python Connector) ◦ Storing files in user/data/named stages (Choosing an Internal Stage for Local Files) ◦ Snowpark libraries (e.g., Snowpark Developer Guide for Python) ◦ Snowpipe streaming with Kafka connector (Using Snowflake Connector for Kafka with Snowpipe Streaming) ◦ JavaScript and Python UDFs (Introduction to JavaScript UDFs) ◦ Tasks for scheduled execution ◦ Table streams for change data capture and audit logs ◦ … and quite a bit more!
  5. localstack localstack-cloud localstack localstack.cloud Seamless integration with DB viz tools

    (e.g., DBeaver) Source: https://www.youtube.com/watch?v=1l9i_755MlA 6
  6. localstack localstack-cloud localstack localstack.cloud 8 Starting Up • Configure your

    auth token, then use the localstack CLI to start up: • Configure your client app to connect to the local endpoint: $ export LOCALSTACK_AUTH_TOKEN=<your-auth-token> $ IMAGE_NAME=localstack/snowflake localstack start import snowflake.connector as sf conn = sf.connect( user="test", password="test", account="test", database="test", host="snowflake.localhost.localstack.cloud", )
  7. localstack localstack-cloud localstack localstack.cloud 11 Sample: Queries over Covid19 dataset

    • Taken from Snowflake “Getting Started” Guide ◦ https://quickstarts.snowflake.com/guide/data_science_with_dataiku/index.html#0 • Data set contains a lot of different data points ◦ mobility data ◦ vaccination data ◦ … • For this sample, we’ll focus on: ◦ Putting CSV files to a local S3 stage ◦ Loading the CSV data into a table ◦ Running some simple SELECT queries
  8. localstack localstack-cloud localstack localstack.cloud 13 Sample App: NYC Citybike Trips

    • Taken from Snowflake “Getting Started” Guide • Contains trips and weather information over several years • Data available in a public S3 bucket ◦ → can be integrated in a local Snowflake stage directly! • Web app displays the data in simple charts
  9. localstack localstack-cloud localstack localstack.cloud 15 Table Streams • See https://docs.snowflake.com/en/user-guide/streams-intro

    • Enables Change Data Capture (CDC) for Snowflake tables • Stream = minimal set of changes from its current offset to the current version of the table • Streams can be “consumed” via DML Queries, e.g.: INSERT INTO target … SELECT * FROM stream …
  10. localstack localstack-cloud localstack localstack.cloud 17 Streamlit Apps • Streamlit =

    Python UI Framework ◦ https://streamlit.io • Integrates natively with Snowflake • Lots of UI components available ◦ Charts ◦ Widgets ◦ Maps ◦ Graphs ◦ … • → Easy way to create Data Apps!
  11. localstack localstack-cloud localstack localstack.cloud Cloud Pods 19 Persistent Shareable Sandboxes

    Cloud Pods are a mechanism that allows you to take a snapshot of the state in your current LocalStack instance, persist it to a storage backend, and easily share it with your team members.
  12. localstack localstack-cloud localstack localstack.cloud 21 Cloud Pods: Save & Load

    DB Snapshots • Cloud pods can be saved and loaded from the CLI • We’ve prepared a cloud pod with a table named “test” - load it like this: $ localstack pod save my-pod-123 $ localstack pod load my-pod-123 $ localstack pod load pod-snowflake $ snow sql -c local --query 'select * from test' +----------------------------------+ | MESSAGE | |----------------------------------| | Hello from LocalStack Snowflake! | +----------------------------------+
  13. localstack localstack-cloud localstack localstack.cloud Implementation 23 • High-Level Architecture ◦

    Query Processors ◦ Core DB Engine ▪ Current: Postgres ▪ Alternative: DuckDB ◦ Auxiliary Services ▪ Streams, stages, … • Written in Python • Running in Docker https://blog.localstack.cloud/2024-05-22-introducing-localstack-for-snowflake
  14. localstack localstack-cloud localstack localstack.cloud Challenges: Query Transpilation, Data Types 24

    • Snowflake/Postgres SQL is similar, yet many subtle differences • Query parsing using sqlglot ◦ Allows us to create a query AST, and perform modifications on it ◦ Big shout-out to Tobiko Data for providing this library! 🙌 ◦ We’ve also been able to contribute a few upstream PRs :) (#2989, #3510, #3519) • Challenge: high-fidelity support for Snowflake data types ◦ Often either no direct mapping, or different semantics in Postgres ◦ Example: timestamps (TIMESTAMP_LTZ, TIMESTAMP_NTZ, etc) ◦ advanced data types in Snowflake like generic VARIANT type ◦ Needed to introduce a custom VARIANT data type in the core DB engine https://blog.localstack.cloud/2024-05-22-introducing-localstack-for-snowflake
  15. localstack localstack-cloud localstack localstack.cloud Local Machine Bridging Local & Remote:

    Connection Proxy 25 • Easily flip the switch between local and remote execution • Can be configured with real Snowflake credentials (see screenshot below) ◦ Calls will be forwarded to real cloud and returned to the client • Enables a lot of exciting use cases: ◦ duplex mode - running queries against local AND remote (local mirror) ◦ Route only requests for certain tables to upstream: JOIN local & remote Client (e.g., Python connector) LocalStack Snowflake Connection Proxy Real Snowflake Cloud Account Core Engine
  16. localstack localstack-cloud localstack localstack.cloud 27 Roadmap • LocalStack “vNext”: Expanding

    our focus into the data engineering space ◦ Based on our learnings and foundation of the LocalStack AWS emulator ◦ Turns out that there is a need for better local testing of data pipelines as well! • The Snowflake emulator is still early stage, but the direction looks very promising • Nicely integrates with our existing LocalStack AWS features ◦ Using local S3; soon: AWS<>Snowflake integrations (e.g., Kinesis Firehose, …) ◦ LS Cloud Platform: saving/loading of Cloud Pods to manage persistent state • Exploring interesting challenges related to data testing ◦ Test data management; testing data/ETL in CI pipelines; … • Most of all: We’d love to LEARN about YOUR use cases! ◦ Get in touch to participate in the LocalStack Snowflake preview!