Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LLM based AI Agents Overview -What, Why, How-

masatoto
July 22, 2024

LLM based AI Agents Overview -What, Why, How-

I talked about AI agents.
These slides translate the Japanese slides.

Tokyo AI Events
https://connpass.com/event/324085/

masatoto

July 22, 2024
Tweet

More Decks by masatoto

Other Decks in Research

Transcript

  1. LLM based AI Agents Overview -What, Why, How - 18,

    July, 2024 DENTSU SOKEN INC. Masato Ota Applied Machine Learning and Artificial Intelligence Seminar: AI Agents
  2. 2 Self-Introduction ▍Masato Ota ▍DENTSU SOKEN Inc. l AI product

    development, technical verification, and PoC projects at a Sier company. ▍AI Technologies of Interest l LLM based Autonomous Agents l Predictive Uncertainty l Human in the Loop & XAI ▍My technical surveys in Japanese(masatoto on SpeakerDeck) l Weekly AI Agents News! l Overview of LLM Multi-Agents l Research Trends in LLM Agents at ICLR 2024 X: @ottamm_190
  3. 3 Agenda ▍What is an AI agent? ▍Why develop AI

    agents? ▍How to develop AI agents? ▍Challenges of AI agents We will broadly explore AI agents from both developer and business perspectives. By introducing case studies of their applications in help desks, we will enhance understanding and clarity. Please refer to the publicly available materials for research trends.
  4. 5 First, let's take a broad look at the applications

    of AI agents Business Applications of AI Agents Consumer-Oriented • Travel and mobility planning • Price comparison and product recommendations • Account and subscription management Internal/Back Office • Meeting scheduling • Document creation and review for legal and HR purposes • Cost management • In-office QA Core Business • Customer support • Software development • Business data analysis • Patents, literature, and market research automate operations Agentic AI System Agents autonomously perform tasks and operations. Multi-Agent System Multiple agents cooperate and compete to simulate and solve problems. Embodied Agents Agents interact with the physical or virtual environment to achieve goals. Computer Control Agents Agents automate tasks including web navigation on a computer. Research Applications of AI Agents
  5. 6 Three types of AI agents for problem-solving Multi-Agent Collaboration

    LLM-based Autonomous Agents RL Agents Simplifies complex problems Builds agent architecture Learns policy from interactions by role-based decomposition with the environment Environment Planning Tool Use Self- Correction Output Following Instructions Memory Planning Evaluation Code Generation Engineer Director Researcher Writer Environment Policy Action State、Reword Note: ※ The pink text corresponds to the core of the agent. ※ Agents were also researched as distributed AI during the second AI boom. Today's Scope ▍Various agent applications utilize one of the following technologies.
  6. 7 What is the Ideal Capability of AI Agents in

    Business? ▍AI agents can autonomously execute various human task processes. ▍The ideal is for the outcomes of the agent's actions to become the task processes. ▍There is a trade-off between the versatility of tasks and execution efficiency. Survey Analysis Formulate Hypotheses! AI Agent Data Understanding Reporting Analysis Aggregation Understanding Customer Situations Search Manuals Search Past Q&A Answer Get Subscription List Usage Analysis Filtering Propose Termination Help Desk Answer Questions! Cost Management Generate Termination Lists!
  7. 8 AI Agents from a Developer's Perspective ▍AI agents can

    be divided into task-dependent and task-independent parts. ▍By tuning the parts that depend on tasks, they can be specialized for specific tasks. ▍LLMs and Agent Architectures are designed to be versatile and independent of specific tasks. Agent Architecture Prompt AI Agent Knowledge Tools LLM Dependent on tasks Independent of tasks Independent of tasks Dependent on tasks Dependent on tasks
  8. 9 What is Agent Architecture? ▍Agent Architecture is a general-purpose

    workflow that autonomously solves problems using LLMs. ▍The workflow consists of planning, memory including knowledge, and actions using tools. Planning Phase Execution Phase Plan & Action type Env Action Self-Correction Tree type Planning Action Observation Planning Action ・ ・ Sequential type State Action Action State Action Action State Evaluation Sequential: ReAct (2022), Reflexion (2023) Plan & Action:ReWoo, LLM Compiler (2023) Tree:LATS, ToT (2023)
  9. 10 What Does it Mean to Solve Problems Autonomously? ▍Understand

    the intent of the task ▍Plan the steps to solve the problem ▍Decide and execute the actions ▍Adapt to information obtained from the environment To enhance the above capabilities, the following Technologies are also essential: l Theory of Mind: Understand the state of the counterpart l Utilization of Memory: Make decisions based on past experiences and knowledge l Self-Correction: Correct errors in actions and plans l Self-Evolution: Continuously improve performance based on experiences
  10. 11 Current Summary ▍What is an AI Agent? l The

    definition of an AI agent varies depending on the technology. l In this scope, it is software that autonomously performs various human tasks and work processes. l Composed of LLM or agent architecture independent of specific tasks and knowledge or tools dependent on tasks. l Capable of understanding the intent of a task, planning a solution path, and executing actions independently. l Can adapt based on information obtained from the environment. ▍Why Develop AI Agents? ▍How to Develop AI Agents? ▍Challenges of AI Agents
  11. 13 LLM, RAG and RPA Workflows are not Agents ▍LLM:

    Generates text based on prompts • Cannot execute actions • Cannot retrieve environment information ▍RAG: Generates text by passing document search results as text prompts • Cannot continue searching until the goal is achieved • Cannot adapt search content based on search results to decide the next search • Requires customization from search to prompt entry for each purpose ▍RPA Workflows: Automates business processes using LLM with no-code workflows • Requires customization for each business workflow • Cannot adapt to environmental changes even if a workflow is created • Becomes complex workflows with many branches for complex tasks LLM Prompt Response LLM Prompt Docs LLM
  12. 14 Versatile and Customizable AI Agents for Solving Business Challenges

    ▍Once developed, the agent architecture can be universally applied, enabling rapid PoC! ▍Recent AI agents are being used for RAG tasks, business data analysis, and RPA in business. Legal Sales HR DX Department Verifying Internal Issues Quickly Agent Architecture Knowledge: Internal regulations, contracts Tools: Word, Search Knowledge: Proposals, commercial materials Tools: PowerPoint, CRM Knowledge: HR policies, recruitment-related Tools: Excel, Outlook, Teams Applying to Similar Business Processes Help Desk Agent Agent Architecture IT Support Inspection Support Customer Support Knowledge: IT manuals Tools: JIRA, FAQ Knowledge: Inspection manuals Tools: Vision API, system integration Knowledge: Usage manuals Tools: Outlook, help site search Reduces the effort of individual development Expands horizontally
  13. 15 Expansion of AI Agent Development Frameworks Type of Agent

    Development Framework Features Agents that can easily test with task- dependent parts (Knowledge & Tools) GPTs, Copilot Studio, Agents for Amazon Bedrock, Vertex AI Agents, Dify Agents Agent architecture cannot be changed. prompt may be auto-generated. Agents that allow selection of LLM and Agent Architecture and prompt writing LangChain Agents, LlamaIndex Agents Various default agent architectures are available. Agents that can be developed from Agent Architecture LangGraph, Assistants API, LLMの⽣成⽤ API Supports threads and state management necessary for agent development. Multi-agent type LangGraph, AutoGen, Crew AI Enables communication design between agents AI Agent = LLM + Agent Architecture + Prompt + Knowledge + Tools Dependent on tasks Independent of tasks ▍Citizen Development: Develop parts dependent on tasks ▍Professional Development: Develop from the agent architecture
  14. 16 Current Summary ▍What is an AI Agent? l The

    definition of an AI agent varies depending on the technology. l In this scope, it is software that autonomously performs various human tasks and work processes. l Composed of LLM or agent architecture independent of specific tasks and knowledge or tools dependent on tasks. l Capable of understanding the intent of a task, planning a solution path, and executing actions independently. l Can adapt based on information obtained from the environment. ▍Why Develop AI Agents? l Enhanced development frameworks have lowered the barriers for both citizen and professional development. l By changing only task-dependent parts, it is possible to quickly verify RAG, RPA, and data analysis. ▍How to Develop AI Agents? ▍Challenges of AI Agents
  15. 18 Methods for Developing AI Agents Learning Inference Give agent

    capabilities to the LLM • Modify model structure (Language Action Model) • Post-hoc learning to enhance reasoning capabilities • Instruct tuning for function calling capabilities • Fine-tuning for agent behavior • Apply RL algorithms for planning capabilities Agent Capabilities Utilize the agent capabilities of LLM • Prompt engineering • Develop agent workflows • Develop tools for function calling capabilities • Design memory and search systems • Manage knowledge LLM LLM Workflow (Today's Topics)
  16. 19 Steps for Developing AI Agents 1. Write out the

    business process and the ideal behavior patterns for the agent 2. Develop tools for actions 3. Write out the business knowledge that the LLM lacks 4. Develop the agent architecture 5. Prompt engineering AI Agent = LLM + Agent Architecture + Prompt + Knowledge + Tools
  17. 20 Problem Setting for the Help Desk Agent ▍Help Desk

    Issue: Inquiries about Internal ChatGPT and RAG solutions developed by our company ▍Agent Task: Create response drafts for initial inquiry handling Agent Identify items to be answered and draft responses. Customer Fresh Desk Help Desk Portal Developer
  18. 21 1. Write out the business process and the ideal

    behavior patterns for the agent ▍Ask the person in charge “when, what tools to use, and what to do.” ▍Write out the ideal plans and actions of the agent. Hearing Results Ideal Plan Collect similar questions and gather information necessary for responses. Ideal Actions • Search help sites for basic product specifications • Search development documents for error messages • Search Web MS Learn for Azure mechanisms
  19. 22 2. Develop tools for actions ▍Tools can retrieve env

    info, update env status, or calculate. ▍Function expressions for Function Calling should be refined many times for accuracy. Python function using Bing API Input : query Process : Perform a single search Output : title, html, url Python function using Azure AI Search Input : query (Japanese) Process : Retrieve three articles Output : (title, text)*3 Function expressions for Function Calling Prompt LLM Generate function name and args Execute Python function { "type": "function", "function": { "name": ”function name", "description": ”function description", "parameters": { "type": "object", "properties": { "query" : { "type": "string", "description": "parameter description" }, }, "required": ["query"] } } } Tools
  20. 23 3. Write out the business knowledge that the LLM

    lacks ▍Write out the prerequisite knowledge needed for tasks. ▍Updating and managing knowledge becomes more challenging if knowledge docs increased. l Even in the era of expert systems, knowledge acquisition and management were issues. Prerequisite knowledge for internal ChatGPT What is the product for? Who will use it? What are the main functions? How is it developed? Prompt You are a help desk agent. You will provide initial responses to user inquiries. Users will ask about the following products. {Knowledge} … Book : エキスパート・システム: 考え⽅・作り⽅・使い⽅ (DSライブラリー)
  21. 24 4. Develop the agent architecture ▍For the help desk,

    we adopted the Plan & Action architecture l The Sequential type's ReAct has fewer points where humans can intervene, leading to endless inference in complicated tasks, accumulating errors and making debugging challenging. l The Plan & Action allows for human operation checks at each plan and action stage, making it possible to decompose complicated tasks and evaluate them individually. ▍ Points to consider for the architecture: l Should the plan be static or dynamically change based on action outcomes? l The granularity of the plan (should subtasks be interdependent or completely independent?) l Where to perform self-correction (after actions, after subtask responses, or after the final response?) Planning Tool Use Self-Correction SubTask Response Inquiry 1:N Final Response Workflow of the proposed agent
  22. 25 Tips: Paper Referenced for Agent Architecture ▍ScreenAgent : A

    Vision Language Model-driven Computer Control Agent ▍Computer Control Agent l Tasks include PowerPoint editing and purchasing items on EC sites. l Tools involve mouse operations and keyboard inputs. l Observations are based on screenshots. l The agent architecture does not depend on specific tasks, making it reusable. ▍Features of the Agent Architecture l Plan & Action type architecture. l In Reflection, it generates one of three types: retry, continue, or replan, based on subtask content and screenshots. Niu, Runliang, et al. "Screenagent: A vision language model-driven computer control agent." arXiv preprint arXiv:2402.07945 (2024).
  23. 26 5. Prompt engineering ▍Introducing techniques to improve accuracy ▍Planning

    Prompt l Use JSON mode l Reduce the number of subtasks in the plan l Generate what you want to achieve with each tool to make reflection easier ▍Tool Use Prompt l Eliminate ambiguity in function names and descriptions l Avoid tool specificity for each application (db or search) l Use multiple search indices for different purposes (index for help site, for dev docs) ▍Reflection Prompt l Evaluate whether subtasks can be achieved from the tool's execution results l If the tool is not effective, generate suggestions for the next tool or parameters Planning Tool Use Self-Correction SubTask Response Inquiry 1:N Final Response
  24. 27 Overall Agent Workflow Question Where is the data used

    stored? I opted out, so is it safe? Planning Subtask 1: Use search_know_narrator_help_site_docs to find out 'Know Narrator Chat data storage location.’ Subtask 2: Use search_know_narrator_developer_docs to learn about 'Azure OpenAI Service data handling.’ Subtask 3: Use search_microsoft_learn_docs to understand 'Azure OpenAI Service data handling after opting out.' Subtask 3 Subtask 2 Subtask 1 Tool Calling Tool Execution Results Reflection Subtask Answer Answer Let me explain the handling of chat data in Know Narrator Chat. The data entered into Know Narrator Chat and the response data from AI are stored in Azure's storage service. Specifically, the application's database uses Azure XxxDB, where chat history and logs used within the application are stored. … "name": "search_know_narrator_developer_docs" "arguments": { "query" : "Chat data storage location"} [{"filename": "KnowNarratorChat_Service_Specs.pdf", "content": ~~~~~}, {"filename": "sys_message_template.md", "content": ~~~~~}, }] {"status" : "OK", "advice" : "Nothing"} Chat history and application logs are stored in Azure's storage service, specifically in the database Azure XxxDB used by the application. … ・ ・
  25. 28 Current Summary ▍ What is an AI Agent? l

    The definition of an AI agent varies depending on the technology. l In this scope, it is software that autonomously performs various human tasks and work processes. l Composed of LLM or agent architecture independent of specific tasks and knowledge or tools dependent on tasks. l Capable of understanding the intent of a task, planning a solution path, and executing actions independently. l Can adapt based on information obtained from the environment. ▍ Why Develop AI Agents? l Enhanced development frameworks have lowered the barriers for both citizen and professional development. l By changing only task-dependent parts, it is possible to quickly verify RAG, RPA, and data analysis. ▍ How to Develop AI Agents? 1. Write out the business process and consider the ideal behavior patterns for the agent. 2. Develop tools and knowledge for task-dependent parts. 3. Develop the agent architecture and finally perform prompt engineering. ▍ Challenges of AI Agents
  26. 30 Technical Challenges of AI Agents ▍AI agents need understanding,

    planning, action, and adaptability skills. ▍Understanding Skill l Interpret the intent behind user questions and long instructions. l Address the frame problem by discerning relevant from irrelevant background information. l Infer the situational context and mental states of users (Theory of Mind). ▍Planning Skill l Formulate feasible procedures to resolve tasks. l Derive execution plans in the absence of prior experience, emulating human problem-solving. l Avoid the repetitive generation of identical subtasks. ▍Action Skill l Adapt actions based on environmental feedback, modifying plans and next steps flexibly. l Prevent the recurrence of similar actions without cognitive progress. l Ensure precise prompt engineering to avoid minor query alterations and redundant searches.
  27. 31 Summary ▍ What is an AI Agent? l The

    definition of an AI agent varies depending on the technology. l In this scope, it is software that autonomously performs various human tasks and work processes. l Composed of LLM or agent architecture independent of specific tasks and knowledge or tools dependent on tasks. l Capable of understanding the intent of a task, planning a solution path, and executing actions independently. l Can adapt based on information obtained from the environment. ▍ Why Develop AI Agents? l Enhanced development frameworks have lowered the barriers for both citizen and professional development. l By changing only task-dependent parts, it is possible to quickly verify RAG, RPA, and data analysis. ▍ How to Develop AI Agents? 1. Write out the business process and consider the ideal behavior patterns for the agent. 2. Develop tools and knowledge for task-dependent parts. 3. Develop the agent architecture and finally perform prompt engineering. ▍ Challenges of AI Agents l Understanding, planning, and adaptability are challenging in practice.
  28. 32 References ▍ What is an AI Agent? l Research

    Application Example: DENTSU SOKEN, Research Trends in LLM Agents at ICLR 2024 l ⽯⽥亨. (1995). エージェントを考える (< 特集>「エージェントの基礎と応⽤」). ⼈⼯知能, 10 (5), 663-667. l 秋⽥ 興⼀郎. (1989). エキスパート・システム: 考え⽅・作り⽅・使い⽅ (DSライブラリー) l Survey of Agents: Masterman, Tula, et al. "The landscape of emerging ai agent architectures for reasoning, planning, and tool calling: A survey." arXiv preprint arXiv:2404.11584 (2024). l Survey of Agents: Wang, Lei, et al. "A survey on large language model based autonomous agents." Frontiers of Computer Science 18.6 (2024): 186345. l Business Application of Agents: SIERRA, The Guide to AI Agents ▍ Why Develop AI Agents? l RPA: Insight Partners, AI Agents are disrupting automation: Current approaches, market solutions and recommendations l RAG: LlamaIndex, RAG in 2024: advancing to agents l Text-to-Analytics Agents: Hong, Sirui, et al. "Data interpreter: An LLM agent for data science." arXiv preprint arXiv:2402.18679 (2024). l Fundamentals of Agent Development: DeepLearningAI, Functions, Tools and Agents with LangChain ▍ How to Develop AI Agents? l Help Desk Agent: DENTSU SOKEN, Introduction of Internal Initiatives? What is a Help Desk Agent? l Reference for Agent Architecture: Niu, Runliang, et al. “Screenagent: A vision language model-driven computer control agent.” arXiv preprint arXiv:2402.07945 (2024). ▍ Challenges of AI Agents l Help Desk Agent: DENTSU SOKEN, Introduction of Internal Initiatives? What is a Help Desk Agent?