Agent開発フレームワークのOverviewとW&B Weaveとのインテグレーション

AI Agent Framework Overview & W&B Weaveとのインテグレーション 2025/5 Keisuke Kamata

2 1. Overview of AI Agent Framework 2. W&B Weave
3. AI Agent FrameworkとW&B Weaveとのインテグレーション

3 • Deep Learning / 生成AI • ヘルスケア / タンパク質言語モデル
• 動物実験 • 生体信号処理 • 因果推論 • オフラインABテスト Keisuke Kamata • 機械学習 • ヘルスケア/コロナ対策 @olachinkei 工学部・情報学研究科 Engagement Manager Lead Data Scientist Healthcare team lead Manager, AI Solution Engineer 最近の生成 AI周りの活動 • Nejumi Leaderboardの開発 • BioNeMo2 Contributor • 日本語wandbot開発 • 社内エージェント開発 < Today! • … ブログ・ホワイトペーパーなど • W&B生成AIホワイトペーパー • AI Agent評価ブログ • 人手評価と自動評価の比較 with いちから • MCPブログ • GENIAC評価ガイド作成 • …

4 AIエージェントとは？ "Agent" can be defined in several ways. Some
customers define agents as fully autonomous systems that operate independently over extended periods, using various tools to accomplish complex tasks. Others use the term to describe more prescriptive implementations that follow predefined workflows. At Anthropic, we categorize all these variations as agentic systems, but draw an important architectural distinction between workflows and agents: Workflows are systems where LLMs and tools are orchestrated through predefined code paths. ワークフローとは、LLMとツールがあらかじめ定められたコードの流れに従って連携するシステムです。 Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks. 一方で、エージェントは、LLMが自らプロセスやツールの使い方を動的に判断・制御しながらタスクを遂行するシステムです。つまり、どうやって目的を達成するかをLLM自身が柔軟に決める点が特徴です。 - Building eﬀective agents, Anthropic

5 Agency Level smolagents guide • AI workflow or AI
Agentが含まれていればAgentic System • AI workflowとAI AgentのCombinationもある • Agency Levelで議論していくと共通認識を正確にできそう • Agency Levelが高い機能の実装を抽象化する Agent Frameworkが開発されている Building eﬀective agents, Anthropic

6 Low code Tool LLMエージェント構築特化型チャットボット構築 /対話設計型汎用オートメーション /AI連携拡張型もう少し気軽に開発をしたい場合
/ Low Codeでデータサイエンティストが構築する場合

7 AI Agent Frameworkの違い Low Ceiling / Low Floor Amazon
Bedrock Agents とっつきやすさ &できることのバランス OpenAI Agents SDK Enterprise利用レベルの機能 Amazon Bedrock Agents IAMの設定やナレッジベースとの連携などが備わっている Agent frameworkよりもIAMなどの設定などのハードルが少し高いが、その分Enterprise利用への道は近い (Pre-defined) Workflowへの対応 High Floor / High Ceiling Memory, Human feedback, Code Interpreter, Integration, Fault tolerance Pre-defined workflowを作れると、システムとしての安定性を確保しやすい。 Agency Levelをあげることが良いと限らないことはOpenAI・Anthropicも表明している ※ Multi-agent communication / Streamingはどのフレームワークでもできるようになってきている？ Agentの抽象度に大きく比例する。少ないコードでとっつき安い Frameworkは、細かいコントロールを効かせにくいダウンサイドがある 1 2 3 X

9 GenAI: easy to demo hard to productionize

10 Weights & Biases AI developer platform

11 各分野ユースケースにおける生成 AI開発の高い要求レベルに対応ファインチューニング社内データでモデルをカスタマイズ Experiments Models Training,
Fine-tuning, Deployment 最適化ハイパーパラメータチューニング Sweeps Automations Table 分析データとメトリクスの可視化と探索事前学習大規模トレーニング AIモデル開発 AIアプリケーション開発ガバナンスコンプライアンス、コラボレーション、セキュリティを支援 Registry | Lineage | Reports 改善精度・遅延・コスト・安全性を評価・最適化プロトタイピング AIアプリの初期バージョンを試作するデプロイデプロイ・ガードレールオブザーブ監視・フィードバック収集 Playground | Traces Guardrails Evaluations | Leaderboards User feedback Weave GenAI Application Development Weights & Biases AI developer platform

12 import weave weave.init() @weave.op() def get_relevant_documents(question:str): return docs get_relevant_documents(question)
企業向けの安全なデプロイメント W&Bクライアントフロントエンド W&Bサーバーお客様セキュリティー領域 W&Bセキュリティー領域 W&B Weaveの基本的な使い方 @weave.op()デコレータひとつで生成AI APIの呼び出しに関連するすべてのコードがバージョン管理され、保存・共有されます

13 トレース • 開発時およびデプロイ後の挙動を完全にモニタリング • Weaveはすべての入力データと出力データを自動的に記録 • 簡単に操作できるトレースツリーに
詳細な情報を記録 • レイテンシ、コストの記録も可能（インテグレーションがあるモデルの場合は自動で計算） and more… 幅広いインテグレーション

14 評価 14 • 独自の評価方法を定義し、さまざまなシナリオでモデルと出力の精度とパフォーマンスを測定 • システムの比較レポートの
自動生成 • ヒューマンフィードバックも可能 Weaveに付属する評価指標ユーザー独自の評価指標 Hallucination Summarization Moderation (based on OpenAI moderation API) Similarity JSON strings XML strings Pydantic data models Context entity recall (from RAGAS) Context Relevancy (from RAGAS) RAGAS EvalForge LangChain And more … LlamaIndex HEMM And more…

15 15 モデル・データ・プロンプトのバージョン管理 • モデル、データセット・プロンプトの保存・バージョン管理がWeave内で可能 • 読み出しも数行で実行可能 ※
モデルとは、データとモデルの動作を定義するコードの組み合わせ

17 • Amazon Bedrock • Anthropic • Cerebras • Cohere
• Google • Groq • Hugging Face Hub • LiteLLM • Microsoft Azure • MistralAI • NVIDIA NIM • OpenAI • Open Router • Together AI W&B Weave Integration LLM Provider • OpenAI Agents SDK • CrewAI • Dify • Smolagents • LangChain • LlamaIndex • DSPy • Instructor • PydanticAI Frameworks • MCP Protocol

Differences in AI Agent Frameworks Low Floor / Low Ceiling
Amazon Bedrock Agents Ease of use & balance of capabilities OpenAI Agents SDK Enterprise-level functionalities Amazon Bedrock Agents • Some agent frameworks come with built-in support for IAM configuration and integration with knowledge bases. • While setting up IAM and similar components may require more effort than using prepackaged frameworks, it brings you closer to enterprise-grade deployment. Support for (Pre-defined) Workflows High Floor / High Ceiling Memory, Human feedback, Code Interpreter, Integration, Fault tolerance • Defining clear workflows can help ensure greater system stability. • (Increasing the agent's level of autonomy (Agency Level) isn't always better — even OpenAI and Anthropic have acknowledged this.) * Multi-agent communication / Streaming is becoming available in all frameworks? The higher the level of abstraction in an agent, the harder it becomes to maintain fine-grained control. Frameworks that are easy to use with minimal code often come with this downside. 1 2 3 X

Agent開発フレームワークのOverviewとW&B Weaveとのインテグレーション

Agent開発フレームワークのOverviewとW&B Weaveとのインテグレーション

W&B Si

More Decks by W&B Si

Other Decks in Science

Featured

Transcript

AI Agent Framework Overview & W&B Weaveとのインテグレーション 2025/5 Keisuke Kamata

2 1. Overview of AI Agent Framework 2. W&B Weave

3 • Deep Learning / 生成AI • ヘルスケア / タンパク質言語モデル

4 AIエージェントとは？ "Agent" can be defined in several ways. Some

5 Agency Level smolagents guide • AI workflow or AI

6 Low code Tool LLMエージェント構築特化型チャットボット構築 /対話設計型汎用オートメーション /AI連携拡張型もう少し気軽に開発をしたい場合

7 AI Agent Frameworkの違い Low Ceiling / Low Floor Amazon

8 1. Overview of AI Agent Framework 2. W&B Weave

9 GenAI: easy to demo hard to productionize

10 Weights & Biases AI developer platform

11 各分野ユースケースにおける生成 AI開発の高い要求レベルに対応ファインチューニング社内データでモデルをカスタマイズ Experiments Models Training,

12 import weave weave.init() @weave.op() def get_relevant_documents(question:str): return docs get_relevant_documents(question)

13 トレース • 開発時およびデプロイ後の挙動を完全にモニタリング • Weaveはすべての入力データと出力データを自動的に記録 • 簡単に操作できるトレースツリーに

14 評価 14 • 独自の評価方法を定義し、さまざまなシナリオでモデルと出力の精度とパフォーマンスを測定 • システムの比較レポートの

15 15 モデル・データ・プロンプトのバージョン管理 • モデル、データセット・プロンプトの保存・バージョン管理がWeave内で可能 • 読み出しも数行で実行可能 ※

16 1. Overview of AI Agent Framework 2. W&B Weave

17 • Amazon Bedrock • Anthropic • Cerebras • Cohere

Differences in AI Agent Frameworks Low Floor / Low Ceiling