Why Eval-Driven Development Is Your Path To Production https://www.forbes.com/councils/forbestechcouncil/2025/04/04/escaping- ai-demo-hell-why-eval-driven-development-is-your-path-to-production/
to Solve the #1 Blocker for Getting AI Agents in Production | LangChain Interrupt https://interrupt.langchain.com/videos/building-reliable-agents- agent-evaluations
LLM Outputs with Human Preferences LLM の出力に対する評価基準 が、評価を進めるにつれてユ ーザー自身によって変化また は洗練されていく [2404.12272] Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences https://arxiv.org/abs/2404.12272
要件に基づきコード生成 (Plan first, then build.) GitHub からも Spec Kit が 発表された Kiro: The AI IDE for prototype to production https://kiro.dev/ github/spec-kit: 💫 Toolkit to help you get started with Spec-Driven Development https://github.com/github/spec-kit
ように再定義できるか Managing the development of large software systems: concepts and techniques | Proceedings of the 9th international conference on Software Engineering https://dl.acm.org/doi/10.5555/41765.41801