Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Kinder and Cost-Effective Agents throu...

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

Building Kinder and Cost-Effective Agents through fine tuning

Learn how to customize AI models for a retail customer service agent using Microsoft Foundry. In this session we will discuss how you can optimize your model through options such as SFT and distillation to reduce latency and token costs and improve the tone of your models. I will also cover how you can synthetically generate your data and create a custom grader to give our model proper guidance for fine tuning. Lastly, we will talk about production, how you can deploy and evaluate your retail customer service agent.

Avatar for Bethany Jepchumba

Bethany Jepchumba

May 19, 2026

More Decks by Bethany Jepchumba

Other Decks in Technology

Transcript

  1. Part of the AgentCon World Tour by the Global AI

    Community AI Agents Developer Conference #AgentConNairobi
  2. What we will cover today Unlock business value with fine-tuning

    Model customization is critical when you need to improve the quality, accuracy, and operation costs for your agentic AI apps Fine-tuning in Microsoft Foundry When is fine-tuning the right choice for me? Why is Microsoft Foundry the right platform to use? Fine-tuning for Agentic Applications Why do models struggle with multi-turn tool calling? How does Fine-Tuning improve agent performance? Summary Model Customization – unlocks business value for enterprise Microsoft Foundry - streamlines fine-tuning for AI developers Demo: Basic Fine-Tuning Understand supervised fine-tuning Demo: Custom Grader Write evaluators to test target criteria Demo: Agentic Fine-Tuning SFT + Distillation to improve tool calling Demo: Synthetic Data Get high-quality datasets with less effort
  3. Why does Zava need model customization? Bruno Zhao Zava Customer

    Help me find the right product for DIY project. Who can I talk to? Make it helpful Customize Cora’s tone & response format to be polite, helpful & conversational Robin Counts Retail Store Manager Help me build loyalty. Can we ensure its responses are valid? Make it accurate Improve Cora’s use of tool selection, parameter propagation, policy adherence Kian Lambert App Dev Manager Help me save on operating costs. Can I teach a cheaper model this task? Make it cost-effective Distill Cora’s task-specific behaviors into smaller models without losing accuracy
  4. The AI Engineer’s customization journey Fine-tune the LLM to: •

    Reduce the length of your prompt • Show not tell the model how to behave • Improve the accuracy when you look up information • Improve the model’s handling of retrieved data Prompt I’m painting my living room wall. What paint should I buy Tone and style Query Extraction Example responses Personalization Intent mapping Inventory retrieval User input Context engineering Desired Output Model adaptation Basic prompt engineering Retrieval/RAG Response I recommend our Eggshell Paint Would you like to know more about color choices
  5. What does fine-tuning mean? LLM Fine-tuned LLM Fine-tuning refers to

    customizing a pre-trained LLM with additional training on a specific task or new dataset for enhanced performance, new skills, or improved accuracy
  6. Why should Zava consider fine-tuning? Domain-specific optimization Task-specific optimization Reduced

    token consumption Efficient resource utilization Smaller models, faster response Shorter prompts, improve response Improve quality Reduce cost Reduce latency Example: Zava has a domain-specific focus (retail) and task-specific focus (question-answering). Let’s think about how fine-tuning can help optimization
  7. What are my fine-tuning options in Microsoft Foundry? Supervised fine-tuning

    Module learns from examples Ex: Content generation task Reinforcement fine-tuning Use grader to reinforce CoT Ex: Reasoning tasks Model distillation Transfer learning to cheaper model Optimize for COST Vision fine-tuning Preference fine-tuning Hybrid fine-tuning Improve image understanding Ex: ClassificationTask Provide good & bad examples Ex: Tone adaptation Improve model use of RAG context Optimize for PRECISION
  8. Demo: Supervised fine-tuning in Zava Bruno Zhao Zava Customer Make

    it helpful Customize Cora’s tone & response format to be polite, helpful & conversational
  9. How can I fine-tune my model to customize the tone?

    Decide vision and scope Choose base model Choose FT technique Pick enterprise-ready model options Dataset Fine-tuning Evaluation Deploy and monitor Regularly benchmark and iterate!
  10. Demo: Fine-tuning for Agentic Applications Make it accurate Improve Cora’s

    use of tool selection, parameter propagation, policy adherence Make it cost-effective Distill Cora’s task-specific behaviors into smaller models without losing accuracy
  11. New for Fine-tuning AI.Azure.com No Data? No problem! Synthesize training

    data from documents and code with synthetic data generation Public Preview Train for Less! 50% discount with Dev Training tier – and higher quota Public Preview Open Source Models Ministral, OSS-20B, Llama 3.3 70B, and Qwen3 32B – in the same UX Public Preview Agentic RFT Fine-tune for tool use in GPT-5 chain of thought reasoning Private Preview We’re making every aspect of fine-tuning better
  12. Synthetic data generation Pre-defined recipes like Q&A and tool calling

    (for agents) Upload PDFs, docs, or code Our multi-agent framework creates high quality training data Public preview
  13. Example: Distillation for Tool Calling Accuracy in Microsoft Foundry Objective

    Teach a best-in-class model to call the right tools to solve complex business problems Model: GPT 4.1-mini Technique: Supervised fine-tuning Data: Synthetic data generation Train: Foundry UI & Python SDK Evaluate: Foundry evals Deploy: DevTier Regularly benchmark and iterate!
  14. Example: Distillation for Tool Calling Accuracy in Microsoft Foundry Objective

    Teach a best-in-class model to call the right tools to solve complex business problems Model: GPT 4.1-mini Technique: Supervised fine-tuning Data: Synthetic data generation Train: Foundry UI & Python SDK Evaluate: Foundry evals Deploy: DevTier Regularly benchmark and iterate!
  15. Example: Distillation for Tool Calling Accuracy in Microsoft Foundry Objective

    Teach a best-in-class model to call the right tools to solve complex business problems Model: GPT 4.1-mini Technique: Supervised fine-tuning Data: Synthetic data generation Train: Foundry UI & Python SDK Evaluate: Foundry evals Deploy: DevTier Regularly benchmark and iterate!
  16. Developer Training Tier Training can be expensive— especially for RFT

    models! DevTier training offers a 50% discount Jobs execute on pre-emptible capacity—think of it like spot VMs for training Public preview
  17. Example: Distillation for Tool Calling Accuracy in Microsoft Foundry Objective

    Teach a best-in-class model to call the right tools to solve complex business problems Model: GPT 4.1-mini Technique: Supervised fine-tuning Data: Synthetic data generation Train: Foundry UI & Python SDK Evaluate: Foundry evals Deploy: DevTier Regularly benchmark and iterate!
  18. New for Fine-tuning AI.Azure.com No Data? No problem! Synthesize training

    data from documents and code with synthetic data generation Public Preview Train for Less! 50% discount with Dev Training tier – and higher quota Public Preview Open Source Models Ministral, OSS-20B, Llama 3.3 70B, and Qwen3 32B – in the same UX Public Preview Agentic RFT Fine-tune for tool use in GPT-5 chain of thought reasoning Private Preview We’re making every aspect of fine-tuning better
  19. Trainer 4. Trainer updates model weights to produce the best

    CoT Let me guess x = apple I need to subtract 5 x = -5 I need to subtract 1 x = 1 2. Model generates multiple samples 0 1 0.5 3. Grader assign samples a score between 0-1. E.g., 0.5 if output is a number, 0.5 if output is correct Grader What is x? x+5=0 1. Prompt sent to model Model Reinforcement Fine-Tuning (RFT) teaches the model to produce outputs that score highly on a learned reward metric. RFT can elicit more structured, goal-directed reasoning behaviors that are not directly obtainable through imitation alone Reinforcement Fine Tuning: Improve Model Reasoning
  20. In Preview: Agentic RFT with GPT-5 Objective Teach a best-in-class

    model to call the right tools to solve complex business problems Model: GPT-5 Technique: Reinforcement fine-tuning Data: 10 manually curated examples Train: Foundry UI & Python SDK Evaluate: Foundry evals Deploy: DevTier Regularly benchmark and iterate! Sign up at aka.ms/agentic-rft-preview
  21. “ Real world results from fine-tuning We consolidated three steps

    into one, response times that were previously five or six seconds came down to one and a half to two seconds on average. This approach made the system more efficient and the 50% reduction in latency made conversations with Discovery AI feel seamless. —Stuart Emslie, Head of Actuarial and Data Science
  22. Recap: Model customization unlocks business goals I’m painting my living

    room wall User input System prompt Few shot examples Add my data Grounded responses Prompt engineering + RAG Shorter prompts Supervised fine-tuning Smaller, cheaper models Distillation Better tool calling Good choice! I recommend our Eggshell Paint. Would you like to know more about color choices? LLM output SFT
  23. Recap: Microsft Foundry makes fine-tuning seamless Model choice The best

    models from the best providers Choose serverless or managed compute Reliability 99.9% availability for Azure OpenAI models Latency guarantees with PTU-M Foundry platform Everything you need in one place: models, training, evaluation, deployments, and metrics Scalability Start with low cost DevTier to experiment Scale up with PTU-M for production workloads