apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability and Guardrails by Santanu Dey (OCBC)

Trustworthy Generative AI: Role Of Observability and Guardrails Santanu Dey
VP, Emerging Technology at a leading Singapore bank

Intro – Santanu Dey • 20+ years in tech industry
• VP, Emerging Technology in a leading Singapore bank • Formerly at Apigee, Iguazio, AWS helped enterprises adopt digital technologies • Have been driving AI adoption and innovation in the enterprise for last 8+ years in various tech roles

Contents • What are the challenges for trustworthy gen AI
• Key threats for enterprise adoption • Mitigation strategies

Gen AI applications

Potential threats for Gen AI applications • Malicious prompt •
Resource overload

Potential threats for Gen AI applications • Poor context retrieval
• Data poisoning

Potential threats for Gen AI applications • System prompt issues
• Inefficient prompts

Potential threats for Gen AI applications • Access privilege issues

Input Manipulation • Prompt Injection can bypass constraints to achieve
goals, causing agents to execute harmful, or restricted actions • Resource overload is deliberately exhausting system resources of the AI system leading to system degradation/failure Behavior risk • Hallucinations generate plausible but false or unverified information • Goal Misalignment occurs when AI agent pursues unintended goals or takes unsafe actions due to vague system instructions, or poor design Repudiation • Lack of reliable traceability against attacks, undesirable or inaccurate behavior • Tool misuse unintended use of integrated tools resulting in data exfiltration or executing dangerous commands Data risks • Memory Poisoning to inject long-term memory or context history, causing it to recall and act on corrupted data • Vector inversion attacks to reconstruct sensitive data Access Privilege & Identity Risk • Impersonation / Identity Theft to gain unauthorized access or manipulating the agent’s behavior • Role Confusion leading to potential privilege escalation, unauthorized actions Potential threats for Gen AI applications

AI guardrail Key Mechanisms: • Input & Output Filtering: Toxicity
filters, jailbreak detection, prompt injection blockers • Policy Constraints: Rule-based access to tools, actions, or sensitive data • Refusal Frameworks: Configured refusal when prompt violates ethical or safety boundaries • System Prompt Hardening: Clear intent definition, role boundaries, and tool invocation rules Real-time controls to prevent unsafe or unintended actions by the AI system

Observability for gen AI Key features of AI observability •
Prompt Flow Logs: Traceable chain-of-thought and agent tool usage • Metric Tracking: Hallucination rate, goal deviation, prompt injection detection rate • Attribution & Explainability: Source reliability, cited content accuracy • Alerts on Anomalies: Sudden spike in toxic responses, unusual API/tool use Provide real-time visibility and post-hoc accountability of AI decisions

Key Observability Metrics for GenAI Applications • Prompt Injection Attempts
• 5% of inputs contain indirect or obfuscated prompt injection attempts. • Toxicity Score • 2% of responses above toxicity threshold score • Sensitive Data Leakage Incidents • Model exposes PII like Social Security Numbers or account IDs in output. • Tool Misuse Incidents • Agent uses a deletion tool during an unrelated or unauthorized task. • Role Confusion Rate • An assistant-level agent executes commands meant only for system/admin roles. Safety & Security • Hallucination Rate • 12% of generated content includes fabricated facts or sources. • Bias Score / Fairness Index • Job recommendation responses show preference for one gender. • Goal Deviation Rate • Agent asked to summarize generates a creative fictional story instead. • Toxic Continuation Rate • Benign user input leads to inappropriate or toxic continuation. Ethical Alignment • Repeatability Score • Same prompt yields 3 different answers across sessions. • Chain of Thought Coherence • Missing or inconsistent steps in multi-turn logical reasoning. • Tool Invocation Consistency • Agent inconsistently uses tools for the same task across runs. • Function Call Accuracy • 20% of tool/API calls fail due to incorrect parameters or structure. Consistency & Accuracy • Response Latency • Average 350ms response time at the 99th percentile. • Token Utilization Efficiency • 25% of generated tokens are filler or irrelevant to the user task. • Cost Tracking • Daily spend includes $90 on LLM inference and $30 on embedding lookups. • Throughput • System supports 15 user prompts/sec at peak load. • Context Truncation Incidents • Model omits relevant details due to prompt length limits, impacting accuracy. Operational • Attribution Accuracy • Only 60% of cited links match verifiable external sources. • Log Completeness • 30% of actions like tool use, memory updates, or user context changes are missing from logs. • Memory Access Transparency • 50% of memory reads/writes are not linked back to a clear user interaction. • Rationale Extraction Accuracy • In 40% of responses, the model cannot explain the steps taken or decisions made. Explainability & Traceability

Thorough adversarial testing before release Key test strategies • Prompt
Injection & Jailbreak Scenarios: Custom fuzzing & adversarial prompts • Tool Misuse Simulations: Test unsafe use of external APIs or plugins • Goal Misalignment Testing: Verify agent stays aligned with intended outcomes • Bias & Toxicity Evaluations: Fairness probes and DEI-aligned testing datasets. Continuous pre-deployment red-teaming for GenAI-specific threats

Existing application & data security practices are still needed Key
security practices • Encryption for data at rest and in transit: protects against vector inversion, leakage • Data Governance: PII redaction, secure vector database access, embedding protection • Access Control: Role-based access control, 2FA at the app, zero-trust principles • Human in the Loop: Identity verification, escalation workflows for high-impact actions Secure the underlying infrastructure, identity, and data layers of GenAI apps

References • Gartner Research: Introduce AI Observability to Supervise Generative
AI - https://www.gartner.com/en/documents/4708899 • OWASP Top 10 for LLM Applications by OWASP.org at https://genai.owasp.org/download/43299/?tmstv=1731900559 • Agentic AI – Threats and Mitigations by OWASP.org at https://genai.owasp.org/resource/agentic-ai- threats-and-mitigations/

Thank you!

apidays Singapore 2025 - Trustworthy Generative...

apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability and Guardrails by Santanu Dey (OCBC)

apidays PRO

More Decks by apidays

Other Decks in Programming

Featured

Transcript

Trustworthy Generative AI: Role Of Observability and Guardrails Santanu Dey

Intro – Santanu Dey • 20+ years in tech industry

Contents • What are the challenges for trustworthy gen AI

Gen AI applications

Potential threats for Gen AI applications • Malicious prompt •

Potential threats for Gen AI applications • Poor context retrieval

Potential threats for Gen AI applications • System prompt issues

Potential threats for Gen AI applications • Access privilege issues

Input Manipulation • Prompt Injection can bypass constraints to achieve

AI guardrail Key Mechanisms: • Input & Output Filtering: Toxicity

Observability for gen AI Key features of AI observability •

Key Observability Metrics for GenAI Applications • Prompt Injection Attempts

Thorough adversarial testing before release Key test strategies • Prompt

Existing application & data security practices are still needed Key

References • Gartner Research: Introduce AI Observability to Supervise Generative

Thank you!