apidays Singapore 2025 - How APIs can make - or break - trust in your AI by Shameek Kundu (IMDA)

How APIs can build – or break – trust in
your GenAI systems Shameek Kundu Executive Director

This session GenAI has a trust deficit Building trust is
(partially) a technical challenge APIs play a critical, under-appreciated role

Introduction Reliable, accessible AI testing Trusted AI ecosystem AI adoption
at scale, with confidence Open-source foundation, 100% Subsidiary Regulator/ policy maker Standard setting body AI Safety Institute What it is NOT Singapore government authority

A thriving community, dedicated to AI testing 190+ 120 60
May 2024 Apr 2025 June 2023 AI Verify Foundation membership

More here

GenAI’s Trust Deficit

As a personal productivity tool, GenAI systems today seem good
enough for a lot of things… Writing – emails, poems, stories, presentations… Helping conduct Research/ acting as sounding board Reading through unstructured text/ images/ videos to extract information Helping write and debug code

…but would you trust them where it really matters? Reading
your medical scan results to decide treatment options? Making automated investments with your life’s savings? Deciding whether someone is guilty of a crime? Running crucial physical infrastructure? Driving you around? Providing counselling services to those in need of mental health services?

In real-life situations, there are ample reasons to not trust
GenAI fully, yet

Not surprisingly, Enterprises grapple with some very basic, non-SciFi questions
of trust in using GenAI GenAI application Potential ‘trust challenges” External chatbot based on utility’s internal documents and data sources • Are the answers provided accurate/ complete? • Is the app making incorrect promises? • Can the app be misused to produce embarrassing content? Large-scale summarisation as part of clinical trials document creation • Is the summary comprehensive and relevant? • Are the citations correct? Investment advice aimed at bank staff (wealth advisors), produced based on bank docs • Is the advice staying within regulatory and compliance requirements? • Is it possible for a malicious user to force the app to provide inappropriate recommendations? LLM-enabled summarisation of physicians’ notes on a single patient • Is the summary accurate and complete? • Are there differences in performance based on patient demographics? LLM-enabled, multi-step evaluation of scam/ fake news referrals • Can a malicious actor force the app to “whitewash” a piece of misinformation? Or bring down the app by forcing uncontrolled consumption of compute? BANK PHARMA HOSPITAL FACT-CHECKER UTILITY

Technical solutions to building Trust

The AI Verify Foundation’s Global Assurance Pilot is helping guide
a lot of our thinking here Industries Banking, Insurance, Fintech, HR tech, Healthcare, Engineering, Pharma, Public sector, Charity Use cases Translation, Summarisation, Chatbot, Internal Knowledge asst., Agentic Automation Geographies SG, TW, HK US, CA UK, CH, FR, DE

Managing the GenAI Trust Deficit, technically 1. Build better 2.
Put in guardrails 3. Test, test, test 4. Monitor in production (and debug/refine)

1. Building GenAI applications for Trust: what does it involve?
Option 1: Monolithic, “lazy” architecture LLM Image/ text referred to service for checking authenticity Simple system prompt instructing LLM to check if the input provided is authentic, with few-shot examples Prompt LLM carrying out the instruction in the prompt Output sent back to original user GenAI application for fact-checking Example: LLM-enabled fact checker

1. Building GenAI applications for Trust: what does it involve?
Option 2: Structured, “horses-for-courses”, multi-step workflow

2. Guardrails: stopping inappropriate outputs (and inputs) Source: Guardrails AI

3. Testing thoroughly, at each step 19 Did the initial
“router” interpret user intent correctly and route to appropriate tool(s)? A Did the individual “tool calls” work as intended • Google search • Website visits • Malicious URL checks B Is the final report based on preceding steps accurate and reflective of preceding steps’ outputs? D Did the output from each step get transferred correctly to the next step? (memory/ state retention) C Is the translation accurate/ meaningful? F Is the summarisation of the final report effective? E

4. Monitoring on an ongoing basis (and debugging, refining along
the way) Tracing Monitoring Source: Arize

APIs: critical, under-appreciated

Labs create LLMs. APIs bring them to real world Enterprise
applications • The app’s entry point to the LLM • Scalability, Load Balancing, Caching. Failover • Experimentation/ alternatives Abstract infrastructure complexity • Prompt templates • System messages • Temperature • Few shot examples Shape Developer Experience • As part of GenAI-native product • As part of an existing Enterprise workflow • Enabling Tool calls Enable real-world integration

Back to our “agentic” fact-checker example: APIs are everywhere!

But an API is more than the integration point. It
is also the control panel, the gatekeeper, the conduit to Trust

API as GenAI’s Trust Layer Transparency Observability (Monitoring) Safety and
Security Governance and controls Evaluations

Transparency Can I see how the system got here? The
Black Box HR assistant • GenAI-based assistant recommends rejecting a CV with no rationale • HR teams and hiring managers lose confidence • Legal team panics! APIs to the rescue! Provides reasoning chains, interim tool call results, citations Transparency

Observability Can I keep track of everything it’s doing? The
Mystery of the Invisible Drift • GenAI-enabled customer service agent suddenly changes tone post an upstream model update APIs to the rescue! Token-level logs, traceable metadata, roll- back options Observability (Monitoring)

Security Can I trust it with sensitive data? The leaky
assistant • GenAI based assistant provides (human) employee with data on clients he is not meant to see APIs to the rescue! Input sanitisation, scoped memory, well- defined access, audit logs Safety and Security

Safety Can I stop it from producing undesirable stuff? The
unsafe chatterbox • Customer-facing chatbox unable to circumvent malicious users’ attempts to make it produce violent, hateful or just embarrassing content APIs to the rescue! Access to state-of-the-art guardrails, with low latency/ high confidence Safety and Security

Governance Can I control who can do what? Prompt chaos
• New marketing intern changes system prompt, in production! • Output goes off-brand APIs to the rescue! Scoped API keys, prompt approval workflows, version locks Governance and controls

Evaluation Can I trust my test results? The dodgy tester
• Developer builds a crude DIY hallucination detector • Evals turn out even more unreliable than the original output! APIs to the rescue! Access to state-of-the-art external evaluators, with all the right parameters (e.g., ground truth, context) Evaluations

Trust by design!

apidays Singapore 2025 - How APIs can make - or...

apidays Singapore 2025 - How APIs can make - or break - trust in your AI by Shameek Kundu (IMDA)

apidays PRO

More Decks by apidays

Other Decks in Programming

Featured

Transcript

How APIs can build – or break – trust in

This session GenAI has a trust deficit Building trust is

Introduction Reliable, accessible AI testing Trusted AI ecosystem AI adoption

A thriving community, dedicated to AI testing 190+ 120 60

More here

GenAI’s Trust Deficit

As a personal productivity tool, GenAI systems today seem good

…but would you trust them where it really matters? Reading

In real-life situations, there are ample reasons to not trust

In real-life situations, there are ample reasons to not trust

In real-life situations, there are ample reasons to not trust

Not surprisingly, Enterprises grapple with some very basic, non-SciFi questions

Technical solutions to building Trust

The AI Verify Foundation’s Global Assurance Pilot is helping guide

Managing the GenAI Trust Deficit, technically 1. Build better 2.

1. Building GenAI applications for Trust: what does it involve?

1. Building GenAI applications for Trust: what does it involve?

2. Guardrails: stopping inappropriate outputs (and inputs) Source: Guardrails AI

3. Testing thoroughly, at each step 19 Did the initial

4. Monitoring on an ongoing basis (and debugging, refining along

APIs: critical, under-appreciated

Labs create LLMs. APIs bring them to real world Enterprise

Back to our “agentic” fact-checker example: APIs are everywhere!

But an API is more than the integration point. It

API as GenAI’s Trust Layer Transparency Observability (Monitoring) Safety and

Transparency Can I see how the system got here? The

Observability Can I keep track of everything it’s doing? The

Security Can I trust it with sensitive data? The leaky

Safety Can I stop it from producing undesirable stuff? The

Governance Can I control who can do what? Prompt chaos

Evaluation Can I trust my test results? The dodgy tester

Trust by design!