MSBuild Lab 334 - Evaluate Your AI Apps For Quality & Safety

Evaluate and improve the quality & safety of your AI
applications Minsoo Thigpen Senior Product Manager, Microsoft Nitya Narasimhan, PhD Senior AI Advocate, Microsoft #MSBuild 2025 | Lab 334

Agenda  Welcome – Meet The Team  Overview –
GenAIOps & Observabiltiy  Getting Started – Launch Lab & Codespaces  Lab Outline – What You’ll Learn  Wrap-up – Teardown, Survey & Next Steps

Welcome – Introducing The Lab Team INSTRUCTOR Nitya Narasimhan, PhD
Senior AI Advocate Microsoft INSTRUCTOR Minsoo Thigpen Senior Product Mgr. Microsoft Our Amazing Proctors Ankit Singhal Chang Liu Hanchi Wang Sydney Lister

Azure AI Foundry Security • Identity • Management Foundry Models
Foundry Agent Service Azure AI Search Foundry Observability Azure AI Services Azure Machine Learning Azure AI Content Safety Copilot Studio Visual Studio GitHub Foundry SDK Serverless Control Azure Kubernetes Service Azure Container Apps Azure App Service Azure Functions Cloud Azure Azure Arc Foundry Local Edge Observability Is The Foundation Of Trust For AI

Workshop Outline What Are Reasoning Models? What Will You Learn
In this Lab

Evaluate Quality & Safety of AI Apps Observability – Why
do Evaluations Matter? Development – How can I streamline usage? Workshop – Simulator with Search Index Workshop – Evaluations SDK Features Tour Workshop – Evaluators: Quality & Custom Workshop – Evaluators: Risk & Safety Homework – Keep Exploring With Sandbox Survey & Teardown

Is my retail copilot grounded in my product data? Is
it delivering relevant responses in a coherent way? Is it acting responsibly by filtering harmful or unsafe content? Is it able to detect and protect the system from adversarial attacks? How can we build user trust in AI applications

Our Journey Today Evaluations in GenAIOps is Critical

My chat application takes “queries” and generates “responses” – using
“context” that is retrieved from my search index. The ”ground truth” reflects my desired response. Evaluations help us “grade” chat application responses for various criteria (quality & safety) – giving us metrics we can analyze to find areas for improvement (observability) with tooling to streamline dev workflow The Azure AI Evaluations SDK makes it seamless!

Setup / Codespaces Launch Lab & Configure Environment Your first
look at the Azure AI Foundry portal - AI Project, Models, Evaluation

Lab 0 / Simulator Synthesize test inputs from indexes Get
test datasets generated from your data indexes – for best coverage

Lab 1 / SDK Understand Evaluators & evaluate() Azure AI
Project – upload Model Configuration – judge Azure Credential – access Evaluators – metrics evaluate() - flow

Lab 2 / SDK Quality & Custom Evaluators Understand how
AI-Assisted Evaluation works (”LLM as Judge”)

Lab 3 / SDK Risk & Safety Evaluators Understand Azure’s
built-in safety evaluators and automated evaluations flow

Bonus Lab 4 / SDK Evaluate Alternative Base Models Use
evaluators to assess model choices for selection phase - visualize in portal

Wrap-Up Teardown, Survey & Next Steps

Evaluate Quality & Safety of AI Apps Observability – Why
do Evaluations Matter? Development – How can I streamline usage? Workshop – Simulator with Search Index Workshop – Evaluations SDK Features Tour Workshop – Evaluators: Quality & Custom Workshop – Evaluators: Risk & Safety Homework – Keep Exploring With Sandbox Survey & Teardown

Call to Action Are You Ready To Create The Future
Of AI? Explore Azure AI Foundry: ai.azure.com Download the SDK: aka.ms/aifoundrysdk Review documentation: aka.ms/AzureAI Take the Azure AI Learn Course: aka.ms/learnatbuild Read More About What’s New In Azure AI Foundry aka.ms/Build25/HeroBlog/Foundry Join our developer community channels:  Discord: aka.ms/ai/discord  Discussions: aka.ms/azureaifoundry/forum

MSBuild Lab 334 - Evaluate Your AI Apps For Qua...

MSBuild Lab 334 - Evaluate Your AI Apps For Quality & Safety

Nitya Narasimhan, PhD

More Decks by Nitya Narasimhan, PhD

Featured

Transcript

Evaluate and improve the quality & safety of your AI

© Copyright Microsoft Corporation. All rights reserved.

Agenda  Welcome – Meet The Team  Overview –

Welcome – Introducing The Lab Team INSTRUCTOR Nitya Narasimhan, PhD

Azure AI Foundry Security • Identity • Management Foundry Models

Workshop Outline What Are Reasoning Models? What Will You Learn

Evaluate Quality & Safety of AI Apps Observability – Why

Is my retail copilot grounded in my product data? Is

Our Journey Today Evaluations in GenAIOps is Critical

My chat application takes “queries” and generates “responses” – using

Setup / Codespaces Launch Lab & Configure Environment Your first

Lab 0 / Simulator Synthesize test inputs from indexes Get

Lab 1 / SDK Understand Evaluators & evaluate() Azure AI

Lab 2 / SDK Quality & Custom Evaluators Understand how

Lab 3 / SDK Risk & Safety Evaluators Understand Azure’s

Bonus Lab 4 / SDK Evaluate Alternative Base Models Use

Wrap-Up Teardown, Survey & Next Steps

Evaluate Quality & Safety of AI Apps Observability – Why

Call to Action Are You Ready To Create The Future