Upgrade to Pro — share decks privately, control downloads, hide ads and more …

apidays Paris 2025 | Architecting Intelligence:...

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.
Avatar for apidays apidays PRO
February 08, 2026

apidays Paris 2025 | Architecting Intelligence: Building at Scale with NVIDIA Nemotron

Architecting Intelligence: Building at Scale with NVIDIA Nemotron
Ziv Ilan, Super AI Startups at NVIDIA

Scale your Enterprise AI from pilot to production. Delivered at GenerationAI Paris 2025, Ziv Ilan (NVIDIA) explores how to architect intelligent systems using the NVIDIA Nemotron family. This session focuses on achieving enterprise-readiness through an API-first mindset, ensuring scalability and performance for complex AI deployments.

------------------------------------

Conference Details:
Conference: GenerationAI Paris 2025, part of FOST (Future of Software Technologies)
Theme: Enterprise GenAI-readiness with the API mindset
Date: 1 - 3 December 2026 • CNIT Forest – Paris
--------------------------

Resources from apidays:
Join our upcoming conferences: https://www.apidays.global/
Read the latest API news: https://www.apiscene.io
Explore the API Landscape: https://apilandscape.apiscene.io/

Avatar for apidays

apidays PRO

February 08, 2026
Tweet

More Decks by apidays

Other Decks in Technology

Transcript

  1. AI is Not a One Size Fits All Problem Advancing

    Open Source and Specialized AI AI Systems of Models Specialized AI AI Efficiency
  2. Nemotron Building blocks that enables to architect your intelligent AI

    agent NVIDIA Nemotron – Open Family of AI Models, Datasets and Techniques Data 9T tokens 30M samples 1M compute hours Models Nano, Super, Ultra 1000 derivatives Libraries NeMo-RL, Minitron, NAS Research 200+ papers
  3. NVIDIA Nemotron Models Available Today Nemotron Nano 2 High efficiency

    SLM Llama Nemotron Super Optimal accuracy-efficiency LLM Llama Nemotron Ultra Highest accuracy powered by world knowledge Reasoning / Agentic Vision Language Information Retrieval Content Safety Nemotron Nano 2 VL Video understanding and Image reasoning Nemotron Parse 1.1 Document extraction Nemotron Page Elements Structured images detection & classification Nemotron Table Structure Rows, columns detection & preservation in markdown Nemotron Graphic Elements Chart component detection &classification Llama Nemotron Embed Multilingual text question-answering retrieval Llama Nemotron Rerank Multilingual fine-tuned reranking model Llama Nemotron Safety Guard Multilingual content safety
  4. Nemotron Datasets Open, transparent training data for every stage of

    AI development Nemotron-Pre v1 Code, STEM, safety, chat, multilingual Nemotron-ClimbLab Optimized pre-training mix Nemotron-Post v1 Code, STEM, chat, reasoning Nemotron-VLM2 OCR training Foundations Agentic AI Sovereign AI Alignment Nemotron-MIND Structured mathematical dialogues OpenCodeReasoning Synthetic reasoning for code OpenMathReasoning Large-scale math reasoning OpenCodeInstruct Largest open instruction tuning Granary ASR speech data (25 languages) Nemotron-Personas Synthesized region-specific demographics Nemotron-PII Sensitive data mitigation Nemotron-Content-Safety 2 Human-LLM safe interactions Nemotron-AIQ-Agentic-Safety Contextual safety risks Nemotron-Safety-Guard 3 Multilingual safety training HelpSteer 2 Alignment: helpful, factual, coherent
  5. Leverage Open Techniques to Build Custom, Efficient Models Nemotron Advanced

    Using NVIDIA Techniques Hybrid Mamba-Transformer Efficient arch for best accuracy, throughput Mixture of Experts Improve compute and latency Architecture Accuracy Efficiency Recipes NeMo RL Align models with human preferences NeMo Data curation accelerates training and improves accuracy Minitron, Neural Arch Search Right-size models for higher efficiency Thinking Budget Minimize model overthinking Efficient Video Sampling Process longer videos Quantization Lowers inference cost while maintaining accuracy Technical Reports Detailed model development recipes Starter Kits Instructions to build custom agents with sample Jupyter Notebooks
  6. How AI Agents Work AI Agents use Tools and Collaborate

    to Complete Complex Projects THINKING TOOL USE COMPUTER USE MEMORY > I/O PROMPT OTHER AGENTS
  7. System-level accelerations require knowledge of the entire system Many tools

    exist—but don’t work together Repeatability Challenging to guarantee consistent results Building Agentic Systems is Complex Agentic applications should integrate with existing tools, heterogeneous data, and perform reliably Architectural Complexity Performance Code Reuse Fragmented solutions result in duplicate work
  8. Fast API Server Typical Agent Chatbot Architecture Projects start small

    and increase in complexity over time Agent Tool 1 Tool 2 LLM Evaluation System Profiler Memory Telemetry Observability Retriever Config
  9. Nemo Agent Toolkit – OSS to enable rapid agentic development

    An open-source Python library for scaling enterprise-ready agentic AI systems – evaluation-driven development Agents Guardrails Evaluation NVIDIA NIM Tools/Workflows Observability Profiling NeMo Agent Toolkit Optimization
  10. NeMo Agent Toolkit Reduces Agentic System Complexity Flexible integration, configuration,

    monitoring, and optimization Function-level security controls for agent workflows* Safety & Security End-to-end tracing across frameworks and agent invocations Observability Tracing Customizable evaluation framework for agent workflow components Evaluation System YAML-based configuration for rapid agent deployment and prototyping Configuration System Agent Hyperparameter Optimizer, NVIDIA Dynamo integration for accelerated performance* Workflow Optimization Performance analysis and runtime bottleneck identification Profiler Multi-framework support: LangChain, LangGraph, CrewAI, Semantic Kernel, Google ADK Framework Agnostic </> Easily connect with MCP tools, Memory providers, custom plugins Agentic Ecosystem Connectors * Coming in a future release
  11. Boost Agent Performance, Reduce Development Time By Adding NeMo Agent

    Toolkit to your AI Agent 1.7X Faster Response Times 1.4X Higher Throughput 57% Fewer Lines of Code Healthcare Virtual Assistant for patient Q&A + NeMo Agent Toolkit • YAML enables component reuse • Write once, reuse anywhere • Higher code quality • Lower developer onboarding time • Supports more concurrent users with stable response times • Increases request processing speed • Handles more requests in less time through efficient execution • Smaller code base, shortens execution time
  12. NVIDIA NeMo Developer Workflow Build, Deploy and Optimize Specialized AI

    Agents Define Use Case Prepare, Generate Data Select Model Build Agent Guardrail Optimize Monitor Deploy Data Flywheel NeMo Curator NeMo Data Designer NVIDIA Nemotron NeMo Retriever NeMo Evaluator NeMo Agent Toolkit NeMo Retriever NVIDIA NIM NeMo Agent Toolkit NeMo RL NeMo Customizer NeMo Evaluator Connect to Data NeMo Guardrails
  13. AI-Q (Pronounced ‘IQ’) NVIDIA Deep Researcher NVIDIA AI Blueprint Uses

    NeMo Agent Toolkit to Connect AI Agents, Data, and Tools User or Machine Prompt Report Agent Reason Llama Nemotron Web Search Tavily Generate Llama Nemotron RAG Embedding NeMo Retriever Reranking NeMo Retriever Extraction NeMo Retriever Vector Database NVIDIA cuVS Enterprise Files Reflect Plan Refine Report Generation Llama 3.3 • Open-source blueprint for research & reporting • Provides observability and transparency • Increases agent accuracy with reasoning • Upload your own data sources • Guide plan and report in real-time • Change report output dynamically • Synthesize hours of material in minutes
  14. Profiling Functions • Tagging along with evaluation, the profiler can

    simulate large numbers of users with multiple requests • The telemetry stream is collected under load to track each component as its running in the workflow • The profiler can perform many different types of analysis: • Token efficiency • Bottleneck analysis • Tool count • Prompt uniqueness • Support for profile guided optimization coming in the future A generic method for monitoring and measuring functions
  15. Visualizing Other Behavior NeMo Agent Toolkit Profiler also collects fine-grained

    telemetry per function, tool, and LLM regardless of framework Visualized are just some examples of workflow behavior extracted from Profiler output. Certain CVEs can be quite large and contain a lot of information, increasing the prompt length It appears the agent called the tool to retrieve source code was used the most (and in the retries) as it attempted to confirm the presence of vulnerabilities.