using Databricks Assistant to create notebooks, author complex queries, identify ways to join tables within data lakes, and resolve coding issues, saving our data teams development time." Bernie Graham, VP Data Engineering
tool that transforms coding productivity “ Saptagiri Kintali, Morgan Stanley This AI-based companion is set to reshape the way we code and interact with our lakehouse. “ Jeroen Roosen, Intellus Group You can’t take this away from me! “ Mike Lavina, 84.51 The introduction of the Databricks Assistant has made it easier for our user base to improve their skill set. “ Nicholas Heier, General Motors I was able to code 200+ lines of robust code in a language I've never coded before “ Josue A. Bogran, Kythera Labs “This cutting-edge AI companion has revolutionized my data analysis journey, simplifying complex tasks and accelerating productivity —Byron Exaporriton, ABN AMRO “The convergence of generative AI and data development” —Alaeddin Khader, Core42 “For someone that's competent, it's extremely good at accelerating development” —Luke Woolley, SSE Business Energy
👑 “How many DBUs were there in Europe in the last quarter?” • Well-documented Code and Queries: Are there samples we can use as examples? • Popularity: What are the most used tables across my company? • Favorites: Which tables I have used recently? Frequently? • Documentation: Are there relevant wikis and docs for interacting with this data? • Organization: Who accessed these tables? Do I frequently work with them? • Lineage: When was this table last updated? • Dashboards: what are the highly-vetted queries powering frequently-used charts? • UC Descriptions and Tags: What are the tables certified by my data team?
Data + People + Activity Lineage Lineage Assets Favorites Modified By Popular Tags Created By Recent View Tables Notebooks Dashboards … Similar Activity User Activity People Terms Org Chart Metadata Data Documentation Jargon
is confidential, please do not share externally Unity catalog One security and governance model for all data and AI across the organization Unified Governance Cloud Data Lake All Raw Data (Logs, Texts, Audio, Video, Images) Databricks Assistant Code gen, text-to-sql, fix my code Data Asset Discovery Find me the right table Semantic Knowledge Graph (Data, People, Activity) AI Documentation Enrich metadata Gen-AI Platform (LLMs, Vector Index, RAG) Delta Lake One platform to store and manage all structured, semi-structured, and unstructured data DatabricksIQ Personalized Gen-AI Experiences
Unity Catalog Data Documentation 32 • Auto-generate concise and informative table and column comments for Unity Catalog • Document your backlog of data assets with missing documentation in minutes Public Preview
Databricks Assistant 33 • AI-powered Authoring Assistant integrated into the notebook, file and SQL editors. • Generate, Fix, and Explain code and queries Public Preview
Help and Support 35 • LLM-powered Help Assistant that provides a conversational interface for documentation and support • Will give accurate answers for Databricks products and technologies such as Spark, Delta Live Tables, DBSQL. • Improved flow for creating support tickets Public Preview
Intelligent Search 36 Contextual and accurate search and knowledge card Accurate semantic search using enterprise knowledge graph Summary knowledge card with details to help quickly get to what you are looking for LLM-powered Natural language understanding Public Preview
AI Code Suggestions-As-You-Type • “Ghost text” single and multi-line suggestions that automatically appear as you type. • Completions available for Python, Scala, R and SQL. 37 Coming Soon
Unity Catalog provides a secure, governed collaboration layer Available Now • Enable/Disable in account or per-workspace • Assistant only uses table and comment description; it doesn’t look at row-level data • Uses user permissions (e.g., it does not send metadata relating to tables that the user does not have permission to see) H1 2024 • Assistant integrates with audit logs so you can see usage in your workspace • [Potential] Prioritize tables for the Assistant to use. • [Potential] Human-in-the-loop curation/validation
Data Intelligence Tools $30 USD per user / month [1,2] Boost the productivity of your data and AI teams [1] Trial Period: $0 for first 6 months after GA [2] Active users only (Note: active = generates DBUs in a Workspace - excludes dashboard viewers) [3] AI-generated comments in UC and in-product intelligent search provided at no additional cost [4] Project Genie (aka Data Rooms) not included in DI Tools SKU (future pricing TBD) What’s Included? [3,4] Databricks Assistant for: • Lakeview Dashboards. Generate visuals using natural language. • Notebooks. Create, explain, and fix SQL and Python code using natural language. • SQL Editor. Create, explain, and fix SQL queries using natural language. • Help. Learn, explore, find, troubleshoot, and get support.
Databricks has opted into the exemption from abuse monitoring and human review program, under which Microsoft does not store any prompts and completions sent to the Azure OpenAI service. Compute Plane The user optionally decides to execute any code Using the Databricks Assistant 41 Assume Role (sts:AssumeRole) Control Plane 3 Azure OpenAI 2 Databricks attaches some metadata to the request and sends it to Azure OpenAI Users Workload 1 (with network isolation) Dedicated Compute Dedicated Compute User selects to diagnose an error or highlights a cell and types a question 1 All traffic between the control plane and Azure OpenAI service is encrypted with TLS 1.2. All data is encrypted at rest. Customers can leverage CMK.
• What data is being sent? • Code or queries in the current notebook cell or SQL tab • Table and column names and descriptions • Previous prompt questions • Favourite tables • The “diagnose error” feature also shares the stack trace from the error output • We do not send your query results • Does Azure OpenAI collect my data? • No. Databricks has opted into the exemption from abuse monitoring and human review program, under which Microsoft does not store any prompts and completions sent to the Azure OpenAI service. • Are there any data residency considerations I should be aware of? • We are currently using an Azure OpenAI service deployed in West Europe for all workspaces deployed within a European Geo and one in East US for everyone else. We will continue to evaluate support for other Azure OpenAI regions for future versions to meet latency and data residency requirements. Data FAQs 42
Model FAQs • What models are you using? • The Databricks Assistant is currently using Azure OpenAI GPT-3.5 as a model. GPT4 is also available to limited preview customers. We’re continually evaluating new models and services (including OpenAI) and may include these in future iterations of the Assistant. • Do you plan to integrate with other models? • Azure OpenAI gave us the fastest path to iteration. However, we’re continually evaluating new models and services (including OpenAI) and may include these in future iterations of the Assistant. • Is my data being used to train models? • No • Will the Databricks Assistant execute dangerous code? • The Assistant will not automatically execute code on your behalf. AI models are error-prone and can make mistakes, misunderstand prompts, hallucinate answers, and introduce bias. You are fully responsible for the code you execute. 43
Unified Data Intelligence Platform Streaming Events Batch Ingestion and Orchestration Cloud Ingestion Unity Catalog - Data, AI Governance & Lineage Data Intelligence Platform BI/SQL Tools Power BI Delta Ingestion Optimized Spark COPY INTO Auto Loader Internal & External Data Sources Silver Filtered, Cleaned, Augmented Bronze Raw Ingestion and History Business-level Aggregates Gold Spark Structured Stream/Batch Data Science/Machine Learning WorkFlows/Jobs D B S Q L Lakehouse Target Architecture - AWS Data Science Tools Enterprise Data Catalog Optional BI Serving Web Applications Model Serving Auto ML, Gen AI & LLM’s
data intelligence engine that uniquely understands your business Mosaic AI Delta Live Tables Workflows Databricks SQL Unity Catalog Delta Lake Databricks Databricks Data Intelligence Platform AI that understands your data. Intelligence from learning your data and usage patterns. Natural language interfaces for everyone. Democratizes productivity with data and AI for every employee. Predictive optimization of the platform. Automatically tunes and optimizes your workloads and infrastructure.
context-aware AI assistant. Automatically generates SQL and Python, explains complex code, and fixes issues. AI assistance in every user experience; Notebooks, SQL & File editors, Lakeview, Help, and more. Powered by DatabricksIQ to ensure highly relevant responses based on your data and usage.