Upgrade to Pro — share decks privately, control downloads, hide ads and more …

API Management in the AI Era

API Management in the AI Era

The slide deck from the API Management in AI Era session from API Days Australia.

https://www.apidays.global/events/australia

Avatar for Nilesh Gule

Nilesh Gule

October 29, 2025
Tweet

More Decks by Nilesh Gule

Other Decks in Technology

Transcript

  1. $whoami { “name” : “Nilesh Gule”, “role” : “Senior Cloud

    Solutions Architect at Avanade” “website” : “https://www.HandsOnArchitect.com", “github” : “https://GitHub.com/NileshGule" “twitter” : “@nileshgule”, “linkedin” : “https://www.linkedin.com/in/nileshgule”, “YouTube” : “https://www.YouTube.com/@nilesh-gule” “likes” : “Technical Evangelism, Cricket”, }
  2. Challenges in using GenAI APIs • Track Token usage across

    multiple applications • Ensure single app doesn’t consume whole TPM quota • Distribute load across multiple endpoints • Ensure committed capacity in PTUs is exhausted before falling back to PAYG instance
  3. Monitor utilization of models • Sends Token Merics usage to

    Applications Insights • Provides overview of utilization of models across multiple applications or API consumers GenAI Gateway Capabilities in Azure API Management
  4. Enforce limits per consumer • Manage and enforce limits per

    API consumer based on the usage of API Tokens GenAI Gateway Capabilities in Azure API Management
  5. Provisioned Throughput Units (PTU) • Allows to specify the amount

    of throughput required in a model deployment. • Granted to subscription as quota • Quota is specific to region and defines the maximum number of PTUs that can be assigned to deployments in the subscription and region • PTU provides • Predictable performance • Allocated processing capacity • Cost savings Understanding costs associated with provisioned throughput units (PTU)
  6. HA - Load Balanced Pool and Circuit Breaker • Helps

    to spread load across multiple Azure OpenAI endpoints • Round-robin, weighted or priority based load distribution strategy GenAI Gateway Capabilities in Azure API Management
  7. AI Gateway capabilities of Azure API Management AI Gateway Security

    & safety • Keyless managed identities • AI Apps & Agents Authorizations -New • Content Safety -GA • Credential Manager Resiliency • Weight load balancing • Priority routing to provisioned capacity models • Backend pools with circuit breaker • Session aware load balancing -GA Scalability • Token rate limits and token quotas • Semantic Caching -GA • Model load balancing • Multi-regional deployments Traffic mediation & control • Azure AI Foundry & Azure OpenAI • OpenAI compatible models -GA • Responses API -GA • WebSocket’s for Realtime APIs • MCP server pass-trough - Soon • Expose APIs as built-in MCP server - Preview Developer velocity • Wizard policy configuration experience • Self-service with the Developer Portal • API Center Copilot Studio connector - Preview • Policy Toolkit Observability • Token counting per consumer • Prompts and completions logging -GA • Built-in reporting dashboard -GA Governance • Policy engine with custom expressions • API Center MCP server registry - Preview • Federated API Management GA GA GA Soon GA GA GA GA Preview Preview GA Preview New
  8. Summary • Track Token usage across multiple applications • Emit

    Token Metrics policy • Ensure single app doesn’t consume whole TPM quota • Token Limit Policy • Distribute load across multiple endpoints • Backend pool load balancing and circuit breaker
  9. Resources • Azure OpenAI Gateway topologies • Azure OpenAI Token

    Limit Policy • LLM Token Limit Policy • Azure OpenAI Emit Token Metric Policy • LLM Emit Token Metric Policy • Houssem Dellai Youtube videos • GenAI Labs • Designing and implementing GenAI gateway solution
  10. Nilesh Gule ARCHITECT | MICROSOFT MVP “Code with Passion and

    Strive for Excellence” nileshgule @nileshgule Nilesh Gule NileshGule www.handsonarchitect.com https://www.youtube.com/@nilesh-gule
  11. Source Code & slide deck Nilesh Gule fork - GenAI

    Labs https://github.com/NileshGule/AI-Gateway GenAI Labs https://aka.ms/apim/genai/labs https://speakerdeck.com/nileshgule/ https://www.slideshare.net/nileshgule/
  12. Q&A