Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LakeFS by Treeverse - IT Press Tour #49 March 2023

LakeFS by Treeverse - IT Press Tour #49 March 2023

The IT Press Tour

April 01, 2023

More Decks by The IT Press Tour

Other Decks in Technology

Transcript

  1. $23M in funding from Zeev Ventures, NVP and DTC Founded

    January 2020 25 team members globally Einat Orr Co-Founder & CEO Oz Katz Co-Founder & CTO lakeFS is an open-source project that provides data engineers with versioning and branching capabilities on their data lakes, through a git-like version control interface. Open source technology with SaaS GTM
  2. Engineering Best Practices For Data Particitioners Dev / Test Stage

    Prod Develop on top of production data, in isolation, with no copy • Safely experiment and test on full production data • Easily Collaborate on production data with your team Promote only high-quality data to production • Automate data quality checks within data pipelines • Bad data is not promoted, yet available in Isolation for debugging Atomic rollback on bad data in production • Rapidly recover from data quality issues in production by reverting the entire data lake and not just a specific table • Troubleshoot on an isolated version of production data at the time of the failure
  3. s3://data-repo/collections/foo s3://data-repo/main/collections/foo lakectl branch create \ lakefs://repo@testing-spark-3 \ --source lakefs://repo@main

    # output: # created branch 'testing-spark-3.0', # pointing to commit ID: 'd1e9adc71c10a’ Object Store Manage Data like Code With lakeFS In 20 Minutes
  4. 20%-80% Storage Cost Reduction lakeFS Outcomes X 2 Double Data

    Engineers Efficiency Immediate Recovery From Production Outages
  5. Data Versioning landscape (partial list) Managed ML Pipelines Data Lake

    Manageability MLOps E2E Platforms Data-centric AI Lakehouse Lifecycle Management Databases
  6. we would manage data pipelines from dev to production the

    way we manage code In a perfect world
  7. Our Community • We grow a community of users (>4K

    members) • We own the project source code ◦ Under Apache 2.0 with transfer of copyrights ◦ Contributions are small and local ◦ No partners on project development • Not part of an OSS foundation Logos in our community
  8. Product GTM Format agnostic data version control Zero Clone copy,

    no data duplication Data stays in place Configurable Garbage collection RBAC Auditing SAML Integration Auto Scaling Managed Disaster Recovery Managed Garbage Collection Support SLA OSS Cloud Support Starting at $2,500 a month