コンテンツ審査を題材とした生成AI機能実装のベストプラクティス

© 2025, Amazon Web Services, Inc. or its affiliates. All
rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. コンテンツ審査を題材とした生成AI機能実装のベストプラクティス⽯⾒和也アマゾンウェブサービスジャパン合同会社デジタルサービス技術本部シニアソリューションアーキテクト

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 2 本セッションについて内容本セッションでは、コンテンツ審査を題材として、⽣成 AI を実プロダクトに実装する際に直⾯する考慮点を整理し、具体的な打ち⼿をご紹介します (L200 〜 L300) 想定される対象者 • ⽣成AI (特にAmazon Bedrock) の利⽤経験はあるが、いざプロダクトに導⼊するとなると求められる品質とのギャップを感じている • 実際に⽣成AIをプロダクト導⼊する際に直⾯する問題や対策を理解しておきたい ※ RAGやOSS LLMの話は含みません。Amazon BedrockでClaudeを扱うケースに着⽬します

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 3 ECサイトへのコメント投稿を審査する場合を考える「このパンすごい美味しかった」「値段の割にボリュームがない」誹謗中傷スパム投稿 LLM 解決できないか︖ ❌ ❌

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 4 ECサイトへのコメント投稿を審査する場合を考えるアプリケーションサーバー Amazon Bedrock 次のコメントを不適切か判断してください。「このパンすごい美味しかった」 Prompt

© 2025, Amazon Web Services, Inc. or its affiliates. All
rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its aﬃliates. All rights reserved. Amazon Conﬁdential and Trademark. 5 実際プロダクションに導⼊するとなると考慮点は多いそもそも⼈⼒や従来の MLモデルではだめなの︖ 全てのコメントをLLMで処理すると案外⾼くつきそう想像している基準で精度良く分類してくれない⽣成AI APIからエラーが返ってきた時にどうしようか︖ セキュリティ⾯も気をつけてと⾔われたが何を気にする︖ レスポンスが案外遅いリリース後の監視や改善はどうする︖ アプリケーションサーバー Amazon Bedrock 次のコメントを不適切か判断してください。「このパンすごい美味しかった」 Prompt

rights reserved. Amazon Conﬁdential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 6 実際プロダクションに導⼊するとなると考慮点は多い他の⼿段との棲み分けコスト精度可⽤性・スループットセキュリティレスポンス速度 LLMOps アプリケーションサーバー Amazon Bedrock 次のコメントを不適切か判断してください。「このパンすごい美味しかった」 Prompt

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 7 LLMの実導⼊における考慮点と打ち⼿他の⼿段との棲み分け⼈、ルールベース、従来のMLモデル精度評価 / モデル / Prompt / タスク分解コストモデルサイズ / タスク分解 / ⾮同期処理可⽤性・スループットクオータ / リトライ / 複数リージョン / ⾮同期処理レスポンス速度モデル選定 / リージョン LLMOps APIのメトリクス監視 / LLMOps Tool セキュリティ (別セッションでカバー) 「AI-T2-03: ⽣成 AI アプリケーション開発におけるセキュリティ・コンプライアンスのポイント」

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its aﬃliates. All rights reserved. Amazon Conﬁdential and Trademark. LLMと従来の手法の棲み分け • 人力や従来のMLモデルではだめなのか？ • どのような場合にLLMの活用がハマるのか？ 1 / 6

rights reserved. Amazon Conﬁdential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 9 コンテンツレビューにおけるLLM活⽤の棲み分けどれを選ぶか、どれを組み合わせるのが良いかは (当たり前だが) 状況による⼈間ルールベースの処理従来のMLモデル LLM コスト中 - 中準備期間⾼中精度中専⾨⼈材エンジニアデータサイエンティストエンジニア導⼊後の処理速度遅⾼⾼⾼不要低早早早低 - ⾼低 ※ 強い利点がある部分を⾚でハイライトしている低低

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 10 コンテンツレビューにおけるLLM活⽤の棲み分け⼈間ルールベースの処理従来のMLモデル LLM ⾼い精度が求められるが、今すぐ本件にアサインできる⾼度な⼈材がいない場合 • 有⼈コンテンツ監視専⾨企業や、社内のカスタマーポートで対応 • エンジニアがいればNGワードなどルールベースの処理から実装コスト中 - 中準備期間⾼中精度中専⾨⼈材エンジニアデータサイエンティストエンジニア導⼊後の処理速度遅⾼⾼⾼不要低早早早低 - ⾼低低低

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 11 コンテンツレビューにおけるLLM活⽤の棲み分け⼈間ルールベースの処理従来のMLモデル LLM ⼤規模なプロダクトでレビュー数も多い + データサイエンティスト組織が存在する場合 • データのラベリングから⾼度なMLモデルの学習・推論までの流れを実装 • ⼈の作業が求められる領域を⼤幅に削減コスト中 - 中準備期間⾼中精度中専⾨⼈材エンジニアデータサイエンティストエンジニア導⼊後の処理速度遅⾼⾼⾼不要低早早早低 - ⾼低低低

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 12 コンテンツレビューにおけるLLM活⽤の棲み分け⼈間ルールベースの処理従来のMLモデル LLM MLモデルの改善や運⽤に機械学習⼈材を割くまでの体制は取りづらいが、今いるアプリケーションエンジニアで可能な限り⾃動化を試みたい場合 • LLMの活⽤により例えばアプリケーションエンジニアも試⾏錯誤可能にコスト中 - 中準備期間⾼中精度中専⾨⼈材エンジニアデータサイエンティストエンジニア導⼊後の処理速度遅⾼⾼⾼不要低早早早低 - ⾼低低低

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 13 コンテンツレビューにおけるLLM活⽤の棲み分け⼈間ルールベースの処理従来のMLモデル LLM 既に従来のMLモデルで対処できている組織であっても、以下の場合LLM活⽤が有⽤ • 従来のMLモデルで判断しきれなかった領域を更に⾼度なLLMで⾃動判別したい • 学習データは少ないが、整備されている審査ガイドラインを活⽤したい • 審査の判断理由や改善点も含めて出⼒させたいコスト中 - 中準備期間⾼中精度中専⾨⼈材エンジニアデータサイエンティストエンジニア導⼊後の処理速度遅⾼⾼⾼不要低早早早低 - ⾼低低低

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its aﬃliates. All rights reserved. Amazon Conﬁdential and Trademark. 精度改善に向けた打ち手 • 精度改善はどこから進めていけばよいのか？ • Promptの修正を繰り返すものの目指すべき精度とのギャップが埋まらない 2 / 6

rights reserved. Amazon Conﬁdential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 15 精度改善の近道は簡易的な評価の仕組み作りから試⾏錯誤するうちに評価基準が徐々に変化することも多いので、まずは改善の Iterationを回すために参考になるような最低限の評価の仕組みを⽤意評価データの作り⽅︓ • ⼈が評価基準を参考にテストしたい⼊⼒と出⼒のペアを作成 • 評価基準を元にLLMで⼊⼒と出⼒のペアを作成 • サービスのログに対して⼈がアノテーション (カスタマーサポートが⼈⼿でチェックしたものを活⽤する場合もこちら) 実際の評価︓ • 審査してOK/NGの2値判定するような場合は正誤が明確で機械的に判断可能 • チャットボットの回答など評価が曖昧なものは⼈やLLM(LLM-as-a-Judge)で判断

rights reserved. Amazon Conﬁdential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 16 最先端のモデルを活⽤する ※ Amazon Bedrock上でもClaude 3.5 Sonnet v2 はオレゴンリージョンで利⽤可能 ※ Claude 3.5 Haikuも、text only版から近⽇公開予定。性能はClaude 3 Opus相当 2024/10/22にClaude 3.5 Sonnet (v2)がリリース、Claude 3.5 Haikuがアナウンス

rights reserved. Amazon Conﬁdential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 17 Promptの原則をおさえるあなたはECサイトのコメントをレビューする役割が与えられています。以下に与えられたガイドラインに正確に従って、投稿コメントが不適切な内容かどうかを判断してください。ガイドライン︓ <guideline> 詳細なガイドライン </guideline> 投稿コメント︓ <comment> メッセージ </comment> レビュー結果を以下のようなJSON形式で出⼒してください: { “violation”: <メッセージが不適切なら”true”、適切なら”false” のbool値>, “explanation”: <ガイドライン違反がある場合のみ含めてください> } https://docs.anthropic.com/ja/docs/build-with-claude/prompt-engineering/overview Anthropic User Guide

rights reserved. Amazon Conﬁdential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 18 Promptの原則をおさえる冗⻑で膨⼤なガイドラインを全て含めると精度低下の原因に⼈⼿やLLMで事前に整理するのも重要審査項⽬ごとのNGなコメント例を多数加えるのも⼀つのアプローチあなたはECサイトのコメントをレビューする役割が与えられています。以下に与えられたガイドラインに正確に従って、投稿コメントが不適切な内容かどうかを判断してください。ガイドライン︓ <guideline> 詳細なガイドライン </guideline> 投稿コメント︓ <comment> メッセージ </comment> レビュー結果を以下のようなJSON形式で出⼒してください: { “violation”: <メッセージが不適切なら”true”、適切なら”false” のbool値>, “explanation”: <ガイドライン違反がある場合のみ含めてください> }

rights reserved. Amazon Conﬁdential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 19 タスク分解と Prompt Chaining 複雑な作業を⼀つのLLMで処理するのではなく、いくつかのサブタスクに分割しそれぞれ個別のLLMで処理するアプローチ https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#example-analyzing-a-legal-contract-with-chaining 項⽬Aの審査項⽬Bの審査項⽬Cの審査最終的な精密審査 Pros: 各タスクの精度が向上、サブタスクごとに個別で改善しやすい (Traceability) Cons: コストや処理時間が増加する可能性がある。⼯夫次第では削減の場合も

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. コスト最適化に向けた打ち手 • 全てのコメントをLLMで処理すると案外高くのではないか？ 3 / 6

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 21 コスト感の確認 200⽂字相当のコメントが 1⽇に100個投稿されたとして、これを処理する料⾦はどれぐらいを想像しますか︖ Claude 3.5 Haiku (軽量モデル): 約〇〇円 / ⽉ Claude 3.5 Sonnet (⾼性能モデル): 約〇〇円 / ⽉ 200⽂字相当のコメントが 1秒に1個投稿された場合は︖ Claude 3.5 Haiku: 約〇〇円 / ⽉ Claude 3.5 Sonnet: 約〇〇円 / ⽉

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 22 コスト感の確認 200⽂字相当のコメントが 1⽇に100個投稿されたとして、これを処理する料⾦はどれぐらいを想像しますか︖ Claude 3.5 Haiku (軽量モデル): 約 150円 / ⽉ Claude 3.5 Sonnet (⾼性能モデル): 約 2千円 / ⽉ 200⽂字相当のコメントが 1秒に1個投稿された場合は︖ Claude 3.5 Haiku: 約 15万円 / ⽉ Claude 3.5 Sonnet: 約 200万円 / ⽉

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 23 コストと複雑さに関するメンタルモデル利⽤規模が⼩さい場合過度に最適化せず、シンプルなアプローチを推奨例︓⾼精度なLLM⼀つで完結させる利⽤規模が⼤きい場合エンジニアリングや科学的な取り組みに注⼒して⻑期的なインフラコスト最適化に注⼒する価値あり例︓タスクを分解してそれぞれにLLMを適⽤ Amazon science: How task decomposi3on and smaller LLMs can make AI more affordable

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 24 モデルによるコストの違い Claude 3.5 Haiku 料⾦ (※2) ※1 ここでは特定の指標の値ではなく定性的な表記をしている ※2 100万トークンあたりのInput token料⾦ (USD)。output tokenも⽐率は同じ 3 0.25 性能(※1) Claude 3.5 Sonnet 10倍+の差 Claude 3 Opus 15 50倍+の差

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 25 タスク分解もコスト最適化に繋がる複雑な作業を⼀つのLLMで処理するのではなく、いくつかのサブタスクに分割しそれぞれ個別のLLMで処理するアプローチ複雑なレビューを⾼性能LLMでまとめて処理簡易な審査 (軽量LLM) 項⽬Aのみ審査(軽量LLM) 複雑な審査 (⾼性能LLM) 項⽬Bのみ審査(軽量LLM)

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 26 ⾮同期処理化によるコスト最適化 Amazon Bedrockのバッチ推論機能を利⽤すると、通常24時間以内にオンデマンドの料⾦の 50% で処理できる利⽤例︓ • パーソナライズされたメール⽂⾯を定期的に作成 • RAGなどの⽤途に埋め込みベクトルを⼀括作成 https://aws.amazon.com/jp/about-aws/whats-new/2024/08/amazon-bedrock-fms-batch-inference-50-price/ Amazon Bedrock 処理したい⼊⼒プロンプト群を含めたjsonlファイル推論結果⾮同期でまとめて処理

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 可用性・スループット • 生成AI APIからたまに4xx, 5xxのエラーが返ってくるのはなぜだろうか？ • 今後更に大規模な利用を見込んでいるが何か備えはできるのだろうか？ 4 / 6

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 28 まずクオータを確認 https://docs.aws.amazon.com/general/latest/gr/bedrock.html 英語のドキュメントやService Quotasで、⼀分あたりのリクエスト数やトークン数等の制限（クオータ）を確認。モデルやリージョンによって異なるクオータに当たっている場合、Bedrockではレスポンスで429,400が返却されるクオータに当たっていないが、過負荷等でリクエストを処理出来ない場合503を返却クオータを超えて利⽤したい場合は⼀度AWSの営業やSAにご相談ください

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 29 リトライの仕組みを必ず意識 APIを呼び出す際にリトライの仕組みがあると扱いやすい SDKによってもリトライの挙動は少しずつ異なるので⼀度確認を例えば、Pythonで利⽤されるboto3の場合、特に重要な429, 503に対してデフォルト設定で最⼤5回API呼び出しを再試⾏する https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html ❌ ❌

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 30 AWS バ " クボ % ン複数のリージョンを合わせて利⽤するケースもある Amazon Bedrockには、Cross-region inferenceという機能があり、⾃動的に複数リージョンを利⽤することで、より可⽤性⾼く安定した推論を実現 30 リージョン 1 アプリケーションエンドポイントリージョン 2 候補リージョンの混雑状況に応じて⾃動ルーティング https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html 第 1 候補第 2 候補 ※ 現在⽶国とヨーロッパが対象

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 31 ⾮同期処理化 UX的に可能であれば、⾮同期処理となるような実装も有⽤過負荷で⼀時的に後段のAPIが利⽤不能な場合も、エンドユーザーへの影響を⼩さくできる Amazon SQS でキューイング AWS Lambda Amazon Bedrock

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 33 モデルによるレスポンス速度の違い Claude 3 Haiku レスポンス速度 (※2) ※1 ここでは特定の指標の値ではなく定性的な表記をしている ※2 同リージョン、⼊⼒・出⼒100トークンの場合の値。タイミングにより変動するので⽬安 2秒 1秒強性能(※1) Claude 3.5 Sonnet 約2倍の差がある

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 34 レスポンス速度観点でのリージョン選定よく選択肢に挙がるリージョン • 北部バージニア (us-east-1) 約144 ms • オレゴン (us-west-2) 約96 ms • 東京 (ap-northeast-1) N/A 東京リージョンにアプリケーションがある場合、確かに東京がレスポンスは早く返るのだが、クオータの上限から考えるとまずはオレゴンを推奨東京からの参考round trip latency ※ レイテンシは AWS Network Manager Infrastructure Performance でのある断⾯での参考値

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its aﬃliates. All rights reserved. Amazon Conﬁdential and Trademark. 35 レスポンス速度の改善には何が寄与するのかレスポンス速度はモデル、トークン数、ネットワーク遅延に主に影響されるよりネットワーク距離が近いリージョンを利⽤できるかより軽量なモデルを利⽤して性能が許容されるか⼊出⼒のトークンを削減できるか • Promptに含めるコンテキスト量の削減 • 出⼒を最⼩限になるよう指定 • キャシュ • Fine Tuning レスポンスの遅さを感じさせない UXにできるか • ストリーミング出⼒ • 処理が進んでいるようなメッセージ

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. LLMOps • 生成AI APIのメトリクスはどうモニタリングはどうするか？ • LLMの実行ログを確認してPromptの改善に繋げたい 6 / 6

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 37 Bedrockのメトリクスを確認するためのダッシュボードが存在 CloudWatchの⾃動ダッシュボードにBedrock専⽤ダッシュボードの⽤意ありどのモデルがどれぐらい使われているか、クオータに当たりスロットリングしているか、そもそもエラーになっているか等が⼀⽬で分かる https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#home:dashboards/Bedrock

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its aﬃliates. All rights reserved. Amazon Conﬁdential and Trademark. 38 LLMの実⾏結果をデバッグ、評価、改善するようなツールも広がりを⾒せている • Bedrockの「Model invocation logging」の設定で、LLMの⼊出⼒ログをCloudWatch LogsやS3に格納可能 • LangSmithやLangFuseをはじめ、トレース、評価、改善も含めた LLMOpsツールも拡張が続いている https://langfuse.com/

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 40 再掲︓実際プロダクションに導⼊するとなると考慮点は多いそもそも⼈⼒や従来の MLモデルではだめなの︖ 全てのコメントをLLMで処理すると案外⾼くつきそう想像している基準で精度良く分類してくれない⽣成AI APIからエラーが返ってきた時にどうしようか︖ セキュリティ⾯も気をつけてと⾔われたが何を気にする︖ レスポンスが案外遅いリリース後の監視や改善はどうする︖ アプリケーションサーバー Amazon Bedrock 次のコメントを不適切か判断してください。「このパンすごい美味しかった」 Prompt

rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its aﬃliates. All rights reserved. Amazon Conﬁdential and Trademark. 41 まとめ︓LLMの実導⼊における考慮点と打ち⼿本セッションでは、コンテンツ審査を題材として、⽣成 AI を実プロダクトに実装する際に直⾯する考慮点を整理し、具体的な打ち⼿をご紹介します他の⼿段との棲み分け⼈、ルールベース、従来のMLモデル精度評価 / モデル / Prompt / タスク分解コストモデルサイズ / タスク分解 / ⾮同期処理可⽤性・スループットクオータ / リトライ / 複数リージョン / ⾮同期処理レスポンス速度モデル選定 / リージョン LLMOps APIのメトリクス監視 / LLMOps Tool セキュリティ (別セッションでカバー) 「AI-T2-03: ⽣成 AI アプリケーション開発におけるセキュリティ・コンプライアンスのポイント」

rights reserved. Amazon Conﬁdential and Trademark. Thank you!

コンテンツ審査を題材とした 生成AI機能実装のベストプラクティス

コンテンツ審査を題材とした 生成AI機能実装のベストプラクティス

More Decks by kazuya iwami

Featured

Transcript

コンテンツ審査を題材とした生成AI機能実装のベストプラクティス

コンテンツ審査を題材とした生成AI機能実装のベストプラクティス