Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Trustworthy Generative AI: The Role of Observab...

Trustworthy Generative AI: The Role of Observability and Guardrails by Santanu Dey (OCBC)

Trustworthy Generative AI: The Role of Observability and Guardrails
Santanu Dey, VP - Emerging Technology at OCBC

apidays Singapore 2025
Where APIs Meet AI: Building Tomorrow's Intelligent Ecosystems
Marina Bay Sands Expo & Convention Centre
April 15 & 16, 2025

------

Check out our conferences at https://www.apidays.global/

Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8

Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io

Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/

apidays

April 15, 2025
Tweet

More Decks by apidays

Other Decks in Programming

Transcript

  1. Trustworthy Generative AI: Role Of Observability and Guardrails Santanu Dey

    VP, Emerging Technology at a leading Singapore bank
  2. Intro – Santanu Dey • 20+ years in tech industry

    • VP, Emerging Technology in a leading Singapore bank • Formerly at Apigee, Iguazio, AWS helped enterprises adopt digital technologies • Have been driving AI adoption and innovation in the enterprise for last 8+ years in various tech roles
  3. Contents • What are the challenges for trustworthy gen AI

    • Key threats for enterprise adoption • Mitigation strategies
  4. Input Manipulation • Prompt Injection can bypass constraints to achieve

    goals, causing agents to execute harmful, or restricted actions • Resource overload is deliberately exhausting system resources of the AI system leading to system degradation/failure Behavior risk • Hallucinations generate plausible but false or unverified information • Goal Misalignment occurs when AI agent pursues unintended goals or takes unsafe actions due to vague system instructions, or poor design Repudiation • Lack of reliable traceability against attacks, undesirable or inaccurate behavior • Tool misuse unintended use of integrated tools resulting in data exfiltration or executing dangerous commands Data risks • Memory Poisoning to inject long-term memory or context history, causing it to recall and act on corrupted data • Vector inversion attacks to reconstruct sensitive data Access Privilege & Identity Risk • Impersonation / Identity Theft to gain unauthorized access or manipulating the agent’s behavior • Role Confusion leading to potential privilege escalation, unauthorized actions Potential threats for Gen AI applications
  5. AI guardrail Key Mechanisms: • Input & Output Filtering: Toxicity

    filters, jailbreak detection, prompt injection blockers • Policy Constraints: Rule-based access to tools, actions, or sensitive data • Refusal Frameworks: Configured refusal when prompt violates ethical or safety boundaries • System Prompt Hardening: Clear intent definition, role boundaries, and tool invocation rules Real-time controls to prevent unsafe or unintended actions by the AI system
  6. Observability for gen AI Key features of AI observability •

    Prompt Flow Logs: Traceable chain-of-thought and agent tool usage • Metric Tracking: Hallucination rate, goal deviation, prompt injection detection rate • Attribution & Explainability: Source reliability, cited content accuracy • Alerts on Anomalies: Sudden spike in toxic responses, unusual API/tool use Provide real-time visibility and post-hoc accountability of AI decisions
  7. Key Observability Metrics for GenAI Applications • Prompt Injection Attempts

    • 5% of inputs contain indirect or obfuscated prompt injection attempts. • Toxicity Score • 2% of responses above toxicity threshold score • Sensitive Data Leakage Incidents • Model exposes PII like Social Security Numbers or account IDs in output. • Tool Misuse Incidents • Agent uses a deletion tool during an unrelated or unauthorized task. • Role Confusion Rate • An assistant-level agent executes commands meant only for system/admin roles. Safety & Security • Hallucination Rate • 12% of generated content includes fabricated facts or sources. • Bias Score / Fairness Index • Job recommendation responses show preference for one gender. • Goal Deviation Rate • Agent asked to summarize generates a creative fictional story instead. • Toxic Continuation Rate • Benign user input leads to inappropriate or toxic continuation. Ethical Alignment • Repeatability Score • Same prompt yields 3 different answers across sessions. • Chain of Thought Coherence • Missing or inconsistent steps in multi-turn logical reasoning. • Tool Invocation Consistency • Agent inconsistently uses tools for the same task across runs. • Function Call Accuracy • 20% of tool/API calls fail due to incorrect parameters or structure. Consistency & Accuracy • Response Latency • Average 350ms response time at the 99th percentile. • Token Utilization Efficiency • 25% of generated tokens are filler or irrelevant to the user task. • Cost Tracking • Daily spend includes $90 on LLM inference and $30 on embedding lookups. • Throughput • System supports 15 user prompts/sec at peak load. • Context Truncation Incidents • Model omits relevant details due to prompt length limits, impacting accuracy. Operational • Attribution Accuracy • Only 60% of cited links match verifiable external sources. • Log Completeness • 30% of actions like tool use, memory updates, or user context changes are missing from logs. • Memory Access Transparency • 50% of memory reads/writes are not linked back to a clear user interaction. • Rationale Extraction Accuracy • In 40% of responses, the model cannot explain the steps taken or decisions made. Explainability & Traceability
  8. Thorough adversarial testing before release Key test strategies • Prompt

    Injection & Jailbreak Scenarios: Custom fuzzing & adversarial prompts • Tool Misuse Simulations: Test unsafe use of external APIs or plugins • Goal Misalignment Testing: Verify agent stays aligned with intended outcomes • Bias & Toxicity Evaluations: Fairness probes and DEI-aligned testing datasets. Continuous pre-deployment red-teaming for GenAI-specific threats
  9. Existing application & data security practices are still needed Key

    security practices • Encryption for data at rest and in transit: protects against vector inversion, leakage • Data Governance: PII redaction, secure vector database access, embedding protection • Access Control: Role-based access control, 2FA at the app, zero-trust principles • Human in the Loop: Identity verification, escalation workflows for high-impact actions Secure the underlying infrastructure, identity, and data layers of GenAI apps
  10. References • Gartner Research: Introduce AI Observability to Supervise Generative

    AI - https://www.gartner.com/en/documents/4708899 • OWASP Top 10 for LLM Applications by OWASP.org at https://genai.owasp.org/download/43299/?tmstv=1731900559 • Agentic AI – Threats and Mitigations by OWASP.org at https://genai.owasp.org/resource/agentic-ai- threats-and-mitigations/