Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MSBuild Lab 334 - Evaluate Your AI Apps For Qua...

MSBuild Lab 334 - Evaluate Your AI Apps For Quality & Safety

Sessions At Microsoft Build
https://build.microsoft.com/sessions/LAB334

GitHub Repo For Self-Guided Learning
https://github.com/microsoft/Build25-LAB334

Discussion Forum To #GetHelp
https://github.com/orgs/azure-ai-foundry/discussions

Abstract:
You’ve built a custom AI application grounded in your enterprise data. How do you ensure response quality and safety? Join us as we explore AI-assisted evaluation workflows with built-in and custom evaluators on Azure AI. Learn what each metric represents, then understand how to analyze the scores for your specific application. Learn why observability is key, and how the generated telemetry can be used both locally, and in the cloud, to help you assess and debug your application performance.

Avatar for Nitya Narasimhan, PhD

Nitya Narasimhan, PhD

May 20, 2025
Tweet

Transcript

  1. Evaluate and improve the quality & safety of your AI

    applications Minsoo Thigpen Senior Product Manager, Microsoft Nitya Narasimhan, PhD Senior AI Advocate, Microsoft #MSBuild 2025 | Lab 334
  2. Agenda  Welcome – Meet The Team  Overview –

    GenAIOps & Observabiltiy  Getting Started – Launch Lab & Codespaces  Lab Outline – What You’ll Learn  Wrap-up – Teardown, Survey & Next Steps
  3. Welcome – Introducing The Lab Team INSTRUCTOR Nitya Narasimhan, PhD

    Senior AI Advocate Microsoft INSTRUCTOR Minsoo Thigpen Senior Product Mgr. Microsoft Our Amazing Proctors Ankit Singhal Chang Liu Hanchi Wang Sydney Lister
  4. Azure AI Foundry Security • Identity • Management Foundry Models

    Foundry Agent Service Azure AI Search Foundry Observability Azure AI Services Azure Machine Learning Azure AI Content Safety Copilot Studio Visual Studio GitHub Foundry SDK Serverless Control Azure Kubernetes Service Azure Container Apps Azure App Service Azure Functions Cloud Azure Azure Arc Foundry Local Edge Observability Is The Foundation Of Trust For AI
  5. Evaluate Quality & Safety of AI Apps Observability – Why

    do Evaluations Matter? Development – How can I streamline usage? Workshop – Simulator with Search Index Workshop – Evaluations SDK Features Tour Workshop – Evaluators: Quality & Custom Workshop – Evaluators: Risk & Safety Homework – Keep Exploring With Sandbox Survey & Teardown
  6. Is my retail copilot grounded in my product data? Is

    it delivering relevant responses in a coherent way? Is it acting responsibly by filtering harmful or unsafe content? Is it able to detect and protect the system from adversarial attacks? How can we build user trust in AI applications
  7. My chat application takes “queries” and generates “responses” – using

    “context” that is retrieved from my search index. The ”ground truth” reflects my desired response. Evaluations help us “grade” chat application responses for various criteria (quality & safety) – giving us metrics we can analyze to find areas for improvement (observability) with tooling to streamline dev workflow The Azure AI Evaluations SDK makes it seamless!
  8. Setup / Codespaces Launch Lab & Configure Environment Your first

    look at the Azure AI Foundry portal - AI Project, Models, Evaluation
  9. Lab 0 / Simulator Synthesize test inputs from indexes Get

    test datasets generated from your data indexes – for best coverage
  10. Lab 1 / SDK Understand Evaluators & evaluate() Azure AI

    Project – upload Model Configuration – judge Azure Credential – access Evaluators – metrics evaluate() - flow
  11. Lab 2 / SDK Quality & Custom Evaluators Understand how

    AI-Assisted Evaluation works (”LLM as Judge”)
  12. Lab 3 / SDK Risk & Safety Evaluators Understand Azure’s

    built-in safety evaluators and automated evaluations flow
  13. Bonus Lab 4 / SDK Evaluate Alternative Base Models Use

    evaluators to assess model choices for selection phase - visualize in portal
  14. Evaluate Quality & Safety of AI Apps Observability – Why

    do Evaluations Matter? Development – How can I streamline usage? Workshop – Simulator with Search Index Workshop – Evaluations SDK Features Tour Workshop – Evaluators: Quality & Custom Workshop – Evaluators: Risk & Safety Homework – Keep Exploring With Sandbox Survey & Teardown
  15. Call to Action Are You Ready To Create The Future

    Of AI? Explore Azure AI Foundry: ai.azure.com Download the SDK: aka.ms/aifoundrysdk Review documentation: aka.ms/AzureAI Take the Azure AI Learn Course: aka.ms/learnatbuild Read More About What’s New In Azure AI Foundry aka.ms/Build25/HeroBlog/Foundry Join our developer community channels:  Discord: aka.ms/ai/discord  Discussions: aka.ms/azureaifoundry/forum