LLMs in Text-Based Virtual Worlds • Simulating Human-like Daily Activities with Desire-driven Autonomy 推論 • MISR: Measuring Instrumental Self-Reasoning in Frontier Models • RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios 自己修正 • Meta-Reflection: A Feedback-Free Reflection Learning Framework • Understanding the Dark Side of LLMs’ Intrinsic Self-Correction ツール利用 • Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage メモリ • Memory-Augmented Agent Training for Business Document Understanding • On the Structural Memory of LLM Agents
from Large Language Models • MALT: Improving Reasoning with Multi-Agent LLM Training • Personalized Multimodal Large Language Models: A Survey 安全性 • SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents • Towards Action Hijacking of Large Language Model-based Agent • Agent-SafetyBench: Evaluating the Safety of LLM Agents ベンチマーク • TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks • LegalAgentBench: Evaluating LLM Agents in Legal Domain Agent Framework • Large Action Models: From Inception to Implementation • EscapeBench: Pushing Language Models to Think Outside the Box • Practical Considerations for Agentic LLM Systems • Challenges in Human-Agent Communication • Specifications: The missing link to making the development of LLM systems an engineering discipline
for Automatic Patent Generation • Hacking CTFs with Plain Agents • Enhancing LLMs for Impression Generation in Radiology Reports through a Multi-Agent System Digital Agent • Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction • AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials • The BrowserGym Ecosystem for Web Agent Research • PAFFA: Premeditated Actions For Fast Agents • Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms Data Agent • A Survey on Large Language Model-based Agents for Statistics and Data Science • DataLab: A Unified Platform for LLM-Powered Business Intelligence • AutoDCWorkflow: LLM-based Data Cleaning Workflow Auto-Generation and Benchmark • Towards Agentic Schema Refinement
Programming through LLM Multi-Agent Collaboration Embodied Agent • Navigation World Models • From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons Multi Agent System • ROMAS: A Role-Based Multi-Agent System for Database monitoring and Planning • A Survey on Multi-Generative Agent System: Recent Advances and New Frontiers • Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework • GENMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration • A Survey on Large Language Model-Based Social Agents in Game-Theoretic Scenarios • From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents • LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation Agentic RAG • Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models • A Collaborative Multi-Agent Approach to Retrieval-Augmented Generation Across Diverse Data
a Multi-Agent System • 放射線科レポートにおける所見から印象を生成するタスクを支援するマルチエージェントシステムを提案 印象とは所見を要約し、臨床医が患者の診断や治療を迅速に判断するための要となる内容 1. Retrieval:類似過去レポートをベクトルDBから検索 2. Radiologist:所見を基に印象を生成 3. Reviewer:印象の一貫性と正確性を検証し、修正を提案 エージェントのワークフロー Agentic AI Systems 12月16日 更新分