Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How APIs can make - or break - trust in your AI...

How APIs can make - or break - trust in your AI by Shameek Kundu (IMDA)

How APIs can make - or break - trust in your AI
Shameek Kundu, Executive Director, AI Verify at Infocomm Media Development Authority (IMDA)

apidays Singapore 2025
Where APIs Meet AI: Building Tomorrow's Intelligent Ecosystems
Marina Bay Sands Expo & Convention Centre
April 15 & 16, 2025

------

Check out our conferences at https://www.apidays.global/

Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8

Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io

Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/

apidays

April 15, 2025
Tweet

More Decks by apidays

Other Decks in Programming

Transcript

  1. How APIs can build – or break – trust in

    your GenAI systems Shameek Kundu Executive Director
  2. This session GenAI has a trust deficit Building trust is

    (partially) a technical challenge APIs play a critical, under-appreciated role
  3. Introduction Reliable, accessible AI testing Trusted AI ecosystem AI adoption

    at scale, with confidence Open-source foundation, 100% Subsidiary Regulator/ policy maker Standard setting body AI Safety Institute What it is NOT Singapore government authority
  4. A thriving community, dedicated to AI testing 190+ 120 60

    May 2024 Apr 2025 June 2023 AI Verify Foundation membership
  5. As a personal productivity tool, GenAI systems today seem good

    enough for a lot of things… Writing – emails, poems, stories, presentations… Helping conduct Research/ acting as sounding board Reading through unstructured text/ images/ videos to extract information Helping write and debug code
  6. …but would you trust them where it really matters? Reading

    your medical scan results to decide treatment options? Making automated investments with your life’s savings? Deciding whether someone is guilty of a crime? Running crucial physical infrastructure? Driving you around? Providing counselling services to those in need of mental health services?
  7. Not surprisingly, Enterprises grapple with some very basic, non-SciFi questions

    of trust in using GenAI GenAI application Potential ‘trust challenges” External chatbot based on utility’s internal documents and data sources • Are the answers provided accurate/ complete? • Is the app making incorrect promises? • Can the app be misused to produce embarrassing content? Large-scale summarisation as part of clinical trials document creation • Is the summary comprehensive and relevant? • Are the citations correct? Investment advice aimed at bank staff (wealth advisors), produced based on bank docs • Is the advice staying within regulatory and compliance requirements? • Is it possible for a malicious user to force the app to provide inappropriate recommendations? LLM-enabled summarisation of physicians’ notes on a single patient • Is the summary accurate and complete? • Are there differences in performance based on patient demographics? LLM-enabled, multi-step evaluation of scam/ fake news referrals • Can a malicious actor force the app to “whitewash” a piece of misinformation? Or bring down the app by forcing uncontrolled consumption of compute? BANK PHARMA HOSPITAL FACT-CHECKER UTILITY
  8. The AI Verify Foundation’s Global Assurance Pilot is helping guide

    a lot of our thinking here Industries Banking, Insurance, Fintech, HR tech, Healthcare, Engineering, Pharma, Public sector, Charity Use cases Translation, Summarisation, Chatbot, Internal Knowledge asst., Agentic Automation Geographies SG, TW, HK US, CA UK, CH, FR, DE
  9. Managing the GenAI Trust Deficit, technically 1. Build better 2.

    Put in guardrails 3. Test, test, test 4. Monitor in production (and debug/refine)
  10. 1. Building GenAI applications for Trust: what does it involve?

    Option 1: Monolithic, “lazy” architecture LLM Image/ text referred to service for checking authenticity Simple system prompt instructing LLM to check if the input provided is authentic, with few-shot examples Prompt LLM carrying out the instruction in the prompt Output sent back to original user GenAI application for fact-checking Example: LLM-enabled fact checker
  11. 1. Building GenAI applications for Trust: what does it involve?

    Option 2: Structured, “horses-for-courses”, multi-step workflow
  12. 3. Testing thoroughly, at each step 19 Did the initial

    “router” interpret user intent correctly and route to appropriate tool(s)? A Did the individual “tool calls” work as intended • Google search • Website visits • Malicious URL checks B Is the final report based on preceding steps accurate and reflective of preceding steps’ outputs? D Did the output from each step get transferred correctly to the next step? (memory/ state retention) C Is the translation accurate/ meaningful? F Is the summarisation of the final report effective? E
  13. 4. Monitoring on an ongoing basis (and debugging, refining along

    the way) Tracing Monitoring Source: Arize
  14. Labs create LLMs. APIs bring them to real world Enterprise

    applications • The app’s entry point to the LLM • Scalability, Load Balancing, Caching. Failover • Experimentation/ alternatives Abstract infrastructure complexity • Prompt templates • System messages • Temperature • Few shot examples Shape Developer Experience • As part of GenAI-native product • As part of an existing Enterprise workflow • Enabling Tool calls Enable real-world integration
  15. But an API is more than the integration point. It

    is also the control panel, the gatekeeper, the conduit to Trust
  16. Transparency Can I see how the system got here? The

    Black Box HR assistant • GenAI-based assistant recommends rejecting a CV with no rationale • HR teams and hiring managers lose confidence • Legal team panics! APIs to the rescue! Provides reasoning chains, interim tool call results, citations Transparency
  17. Observability Can I keep track of everything it’s doing? The

    Mystery of the Invisible Drift • GenAI-enabled customer service agent suddenly changes tone post an upstream model update APIs to the rescue! Token-level logs, traceable metadata, roll- back options Observability (Monitoring)
  18. Security Can I trust it with sensitive data? The leaky

    assistant • GenAI based assistant provides (human) employee with data on clients he is not meant to see APIs to the rescue! Input sanitisation, scoped memory, well- defined access, audit logs Safety and Security
  19. Safety Can I stop it from producing undesirable stuff? The

    unsafe chatterbox • Customer-facing chatbox unable to circumvent malicious users’ attempts to make it produce violent, hateful or just embarrassing content APIs to the rescue! Access to state-of-the-art guardrails, with low latency/ high confidence Safety and Security
  20. Governance Can I control who can do what? Prompt chaos

    • New marketing intern changes system prompt, in production! • Output goes off-brand APIs to the rescue! Scoped API keys, prompt approval workflows, version locks Governance and controls
  21. Evaluation Can I trust my test results? The dodgy tester

    • Developer builds a crude DIY hallucination detector • Evals turn out even more unreliable than the original output! APIs to the rescue! Access to state-of-the-art external evaluators, with all the right parameters (e.g., ground truth, context) Evaluations