AndroidアプリのAI実装をAndroidifyで学ぶー Google公式サンプルによる体験と実装ー

Tokyo Takahiro Menju Google Developers Expert for Android AndroidアプリのAI実装をAndroidifyで学ぶー
Google公式サンプルによる体験と実装ー

Androidifyとは？あなたのAndroidボットが作れるアプリ

AndroidifyはGitHubにリポジトリがある！コードが読める！

Androidifyでは不思議がいっぱいあります • "カメラに人が映ると撮影ボタンが有効になる” • "テキストプロンプトから画像を生成する" • "写真から背景を除去してステッカーを作る" • ...

00 このセッションでは体験とその実装をみていきます。 https://qiita.com/takahirom/items/c9717bc323b46b2b6cef Qiitaにも記事を置いてみました。実際のコードはこちらにあります

01 カメラに人が映ると撮影ボタンが有効に

人が映ると、リアルタイムでシャッターボタンが緑色にカメラに人が映ると撮影ボタンが有効にどうやって人が映る判定をするか？

裏側の技術： ML Kit Pose Detection API 人物判定方法 NOSE (鼻) LEFT_SHOULDER
(左肩) RIGHT_SHOULDER (右肩) 3点が検出されたら「人がいる」と判定 • オンデバイスでリアルタイムで人物の姿勢を検出するための API • 33個のランドマーク（関節点）の位置を検出 • 各ランドマークの信頼度（ inFrameLikelihood）を0.0〜1.0で返す

Androidifyで使われている ML Kit Pose Detection API 仕様と特徴ライブラリ com.google.mlkit:pose-detection:18.0.0-beta5 モデル
モデル名: ML Kit内蔵Poseモデルアプリダウンロードサイズへの影響 ~10.1MB ※ 入出力形式入力 InputImage（カメラフレーム）出力 Pose（33個のPoseLandmark）実行場所オンデバイス処理 ※ https://developers.google.com/ml-kit/vision/pose-detection/android ほぼすべて(minSdkVersion = 21) Pixel 3XL: ~30FPS ※ 対応端末

02 Androidボット生成用のプロンプトを自動作成

Help me writeを押すと何も書かなくても自動で勝手にプロンプトを書いてくれる Androidボット生成用のプロンプトを自動作成

裏側の技術： ML Kit Prompt API or Firebase AI Logic Gemini
API • No モデルがダウンロード済み Firebase AI Logic Gemini API Remote Config useGeminiNano() Start No ML Kit Prompt API Gemini Nano

API • No モデルがダウンロード済み Firebase AI Logic Gemini API ML Kit Prompt API Gemini Nano Remote Config useGeminiNano() Start No

裏側の技術： ML Kit Prompt API • オンデバイスで生成AIを実行する • Gemini
Nanoを使う • 正式にはML KitのGenAI APIのPrompt API • AI Core(Android 14からの仕組み)を使っている • 元々、“Google AI Edge SDK” があったが、Deprecatedになり、 ML Kit Prompt APIに https://developers.google.com/ml-ki t/genai/prompt/android

Androidifyで使われている ML Kit Prompt API 仕様と特徴ライブラリ com.google.mlkit:genai-prompt:1.0.0-alpha1 モデルモデル名:
Gemini Nano 端末にインストールされているものによる (次スライド ) アプリダウンロードサイズへの影響なし。端末単位でダウンロードされるため (基本的なライブラリの影響は除く) 入出力形式入力画像出力テキストテキスト実行場所オンデバイス処理対応端末次スライド https://developers.google.com/ml-kit/genai

Androidifyで使われている ML Kit Prompt API 仕様と特徴右に掲載されている端末たち対応端末 https://developers.google.com/ml-kit/genai

API • No モデルがダウンロード済み Firebase AI Logic Gemini API Remote Config useGeminiNano() Start No ML Kit Prompt API Gemini Nano

裏側の技術： Firebase AI Logic Gemini API • クラウドで生成AIを実行する • 生成AIのGemini
と画像生成用のImagen modelが使える • 基本AIのAPIができることは、できる • 元々、“Vertex AI in Firebase” があったが、名前が変わった • 不正利用されないようApp Checksが必要 https://firebase.google.com/docs/ai- logic/models

Androidifyで使われている Firebase AI Logic Gemini API 仕様と特徴ライブラリ com.google.firebase:firebase-ai (Firebase
BOM 34.2.0）モデルモデル名: Gemini modelとImagen modelが使えるアプリダウンロードサイズへの影響なし。クラウドで処理 (基本的なライブラリの影響は除く) 入出力形式次スライドで説明実行場所クラウド処理基本なんでも(min sdk version = 23) 対応端末 https://firebase.google.com/docs/ai-logic/models

Androidifyで使われている Firebase AI Logic Gemini API 仕様と特徴入出力形式 https://firebase.google.com/docs/ai-logic/models Input
テキスト,コード, PDF, 画像, ビデオ, オーディオ Output テキスト,コード, 画像, オーディオ

結局 Androidボット生成用のプロンプトを自動作成でどうしているか？

Androidボット生成用のプロンプトを自動作成でどうしているか？ https://firebase.google.com/docs/ai-logic/models 入出力形式 (ML Kit & Firebase AI Logic 兼用プロンプト
) 入力(デフォルトの入力 ) 出力テキスト(Gemini Nanoより) テキスト Generate 10 different random prompts as a comma separated list for a description of what a person looks like for android bot generation: include hair color texture and length, clothing including colors and details (like the persons shirt and pants or dress and collar types), with accessories. Make them, fun, safe and all different, dont include gender or ethnicity or dangerous content. For example "wearing blue jeans, gray ruffly blouse, holding a magnifying glass with sparkly shoes and brown wavy hair." The prompt should: - it cannot contain gender or ethnicity or dangerous content. - it cannot contain nudity or explicit content. - it cannot contain any weapons or violent references. - it cannot contain references to drugs or other illicit substances. - it cannot contain hate speech or other offensive language. - it cannot contain blood or gore or violence. - it cannot contain political symbolism. wearing a bright pink sundress with a white lace overlay, holding a seashell and a starfish, with long, curly brown hair and bright blue eyes. holding a worn leather-bound book and a vintage camera, with short, choppy black hair and a mischievous grin. wearing a cozy red sweater and jeans, with a beanie and a scarf, and long, straight brown hair. wearing a sparkly silver dress with a plunging neckline, with long, flowing blonde hair and red lipstick. wearing a simple white t-shirt and black leggings, with a baseball cap and sneakers, and short, buzzed brown hair. wearing a colorful patchwork jacket and jeans, with a bandana around their neck and long, braided brown hair. wearing a long, flowing purple dress with a floral print, with a wide-brimmed hat and long, curly black hair. ... One-shotプロンプトを使っている

03 安全でない画像を入れられたときにエラーに

安全でない画像を入れられたときにエラーに

安全でない画像を入れられたときにエラーに裏側の技術：同じく ML Kit Prompt API or Firebase
AI Logic Gemini API

“安全でない画像を入れられたときにエラーに ”どうしているか？ https://firebase.google.com/docs/ai-logic/models 入出力形式 (ML Kit Gemini Nano) 入力(デフォルトの入力 )
出力テキスト (Gemini Nano より) [TASK] You are a Validator. Analyze the attached image and determine its validity based on the rules. [RULES] VALID if AND ONLY if: 1. PRIMARY subject is a person showing their head and shoulder 2. The image MUST NOT contain: Nudity, Explicit content, Illegal weapons, Violent references, Drugs, Illicit substances, Hate speech, Offensive language, Blood, Gore, or Violence. [OUTPUT] Return ONLY one string. Check sequentially. Output the first failure code that applies: 1. Is the PRIMARY subject NOT a person (e.g., animal, object, landscape)? -> "not_a_person" 2. Is the person present but missing face/head/shoulders or too blurry? -> "not_enough_detail" 3. Does the image violate any negative policy (Rule 2)? -> "policy_violation" 4. If all rules are passed: -> null not_a_person テキスト画像多分本当は policy_violationを返すべきだが、 not_a_personを返している整理されたルールが書いてある

“安全でない画像を入れられたときにエラーに ”どうしているか？ https://firebase.google.com/docs/ai-logic/models 入出力形式 (Firebase AI Logic) 入力(デフォルトの入力 ) 出力
テキスト You are to analyze the provided image and determine if it is acceptable and appropriate based on specific criteria. In the JSON response, respond with the result 'success' as set to true or false based on results. If the image is considered invalid, include the relevant reason as to why it is invalid in the 'error' property. A photo is only valid if: - it is a photo of a person, at least showing their shoulders and head, it can be a full body photo - it must be a photo of a person - the photo has a clear main person in it, if there are people in the background ignore them - it cannot contain nudity or explicit content - it cannot contain illegal weapons or violent references - it cannot contain references to drugs or other illicit substances - it cannot contain hate speech or other offensive language -it cannot contain blood or gore or violence. { "success": false, "error": "policy_violation" } テキスト画像やるべきことが単に書いてあるようにみえる

04 テキストから Androidボットを生成

テキストから Androidボットを生成

テキストから Androidボットを生成裏側の技術： Firebase AI Logic Gemini API

裏側の技術： Firebase AI Logic Gemini API • No Firebase AI
Logic Gemini API FineTuned model Remote Config useImagen() Start Firebase AI Logic Gemini API Imagen

テキストから Androidボットを生成でどうしているか？ https://android-developers.googleblog.com/2025/05/androidify-how-androidif y-leverages-gemini-firebase-ml-kit.html • FineTuned model？ ◦ Imagen 3
modelをSupervised Fine-Tuning (SFT)を使ってFine tuneしている。 ▪ この方法はモデルの全ての weightを更新できる。 ◦ 画像とテキストのペアを用いて学習 ◦ 具体的で魅力的な Androidボットが作れるそう。

テキストから Androidボットを生成でどうしているか？ https://firebase.google.com/docs/ai-logic/models 入出力形式 (Firebase AI Logic) 入力(デフォルトの入力 ) 出力
This 3D rendered, cartoonish Android mascot rendered in a photorealistic style, with the {skinTone} skin color and {prompt}. The figure is centered against a white background gives the figurine a unique and collectible appeal. テキスト画像 (FineTunedでないImagenで作成)

05 画像から Androidボットを生成

画像から Androidボットを生成

裏側の技術： 2段階に分かれている • 画像からプロンプト生成 Start プロンプトから画像生成

裏側の技術： 2段階に分かれているテキストから Androidボットを生成と同じ画像からプロンプト生成 Start プロンプトから画像生成

裏側の技術： 2段階に分かれている ML Kit Prompt API or Firebase AI Logic
Gemini API 画像からプロンプト生成 Start プロンプトから画像生成

画像からAndroidボットを生成でどうしているか？画像からプロンプト生成入出力形式入力(デフォルトの入力 ) 出力 ## Role You are an
expert image analyst specializing in generating detailed, objective descriptions of people. ## Task Your task is to describe the person in the provided image in vivid detail, following the guidelines and examples below. ## Guidelines - Start with the overall mood or impression of the person (e.g., serene, joyful, pensive). - Describe the person's physical appearance, focusing on hair (color, style, length) and any visible facial features. - Detail the clothing, including the type of garments, style, color, and material. - Mention any accessories, such as glasses, hats, or jewelry. - Describe the immediate surroundings, including any objects, animals, or テキスト画像テキスト Androidボットの特徴次でプロンプトを拡大 Gemini Nano用のプロンプト

## Role You are an expert image analyst specializing in
generating detailed, objective descriptions of people. ## Task Your task is to describe the person in the provided image in vivid detail, following the guidelines and examples below. ## Guidelines - Start with the overall mood or impression of the person (e.g., serene, joyful, pensive). - Describe the person's physical appearance, focusing on hair (color, style, length) and any visible facial features. - Detail the clothing, including the type of garments, style, color, and material. - Mention any accessories, such as glasses, hats, or jewelry. - Describe the immediate surroundings, including any objects, animals, or other people interacting with the subject. ## Constraints - The output must be a single, coherent paragraph. - If no person is visible in the image, state that clearly and do not describe anything else. - Provide only the description. Do not add any introductory or concluding remarks. ## Examples ### Example 1: Standard Case Input: [Image of a person on a picnic blanket with a dog] Output: A highly detailed and realistic portrayal of a person with a serene and pleasant mood. The figure has short, chin-length, straight dark black hair. No facial hair is present. Blue mirrored sunglasses are resting on top of its head. The figure is wearing a loose-fitting, light gray kimono-like top with a V-neckline and wide, elbow-length sleeves. This top features intricate, colorful embroidery in muted red, green, and yellow floral patterns on the front and sleeves. On its Roleを与えている

## Examples ### Example 1: Standard Case Input: [Image of
a person on a picnic blanket with a dog] Output: A highly detailed and realistic portrayal of a person with a serene and pleasant mood. The figure has short, chin-length, straight dark black hair. No facial hair is present. Blue mirrored sunglasses are resting on top of its head. The figure is wearing a loose-fitting, light gray kimono-like top with a V-neckline and wide, elbow-length sleeves. This top features intricate, colorful embroidery in muted red, green, and yellow floral patterns on the front and sleeves. On its bottom, the figure wears loose-fitting, light gray wide-leg pants made of a soft, flowing material. No footwear is visible. The figure is seated on a red and white checkered picnic blanket. Next to it on the blanket is a clear plastic bottle. It is interacting with a black and white Pomeranian-like dog, which has black fur with distinct white markings on its chest, legs, and face, and a leash attached to its collar. The overall depiction aims for a clear and life-like appearance. ### Example 2: Corner Case (No Person) Input: [Image of an empty park bench] Output: No person is visible in the image. ## Input {{image}} ## Output Reminder Take a deep breath, read the instructions again, read the inputs again. Each instruction is crucial and must be executed with utmost care and attention to detail. Description: Few-shot を使っている

画像からAndroidボットを生成でどうしているか？画像からプロンプト生成入出力形式入力(デフォルトの入力 ) 出力 Extract detailed information about the
human subject included in the provided image. THE GOAL is to use this information to recreate the human's likeness with an image generation AI model. * Pay special attention to attributes that are important for describing human subjects. Provide rich visual detail for attributes such as: - Hair: Describe the hair in detail, including its style (e.g., layered bob, loose waves, tight curls), length (e.g., chin-length, shoulder-length, cascading), and color. For hair color, be specific about the particular shade of hair (e.g. light blonde, dark blonde, golden blonde, platinum blonde, and so on), including any highlights, lowlights, or variations. If applicable, meticulously describe any bangs (e.g., blunt, side-swept, wispy), braids (e.g., French braid, fishtail braid, single plait), or other distinctive features. Explicitly name the hairstyle if known (e.g., pixie cut, updo, ponytail). If the subject does not have hair, describe it as bald. - Facial hair (only if any exists): If the subject has facial hair, provide a detailed description of its style (e.g., goatee, full beard, mustache), length テキスト画像テキスト Androidボットの特徴 Firebase AI Logic用

Extract detailed information about the human subject included in the
provided image. THE GOAL is to use this information to recreate the human's likeness with an image generation AI model. * Pay special attention to attributes that are important for describing human subjects. Provide rich visual detail for attributes such as: - Hair: Describe the hair in detail, including its style (e.g., layered bob, loose waves, tight curls), length (e.g., chin-length, shoulder-length, cascading), and color. For hair color, be specific about the particular shade of hair (e.g. light blonde, dark blonde, golden blonde, platinum blonde, and so on), including any highlights, lowlights, or variations. If applicable, meticulously describe any bangs (e.g., blunt, side-swept, wispy), braids (e.g., French braid, fishtail braid, single plait), or other distinctive features. Explicitly name the hairstyle if known (e.g., pixie cut, updo, ponytail). If the subject does not have hair, describe it as bald. - Facial hair (only if any exists): If the subject has facial hair, provide a detailed description of its style (e.g., goatee, full beard, mustache), length (e.g., stubble, short, long), texture (e.g., coarse, fine, wiry), and color (including any variations). Explicitly name the facial hair style if known. If the subject does not have facial hair, describe it as no facial hair. - Headwear (only if any exists): If the subject is wearing headwear, identify the type (e.g., baseball cap, fedora, beanie), color, and material. Describe any visually distinct details such as patterns (e.g., plaid, stripes, floral), textures (e.g., knit, leather, straw), or embellishments (e.g., embroidery, sequins, ribbons). Specify its position on the head (e.g., tilted back, covering the ears). Include the name of the headwear if possible. - Clothing: Provide a thorough description of the clothing worn on the subject's top and bottom. For each garment, detail the style (e.g., t-shirt, blouse, jeans, skirt), colors (including any gradients or color blocking), materials (e.g., cotton, silk, denim), patterns (e.g., polka dots, floral, paisley, geometric), embellishments (e.g., buttons, zippers, lace), and fit (e.g., tight, loose, tailored). Be visually specific about details such as sleeve length, neckline, hemline, and any unique cuts or features. Include the name of the clothing items if known (e.g., A-line skirt, Henley shirt). You MUST describe clothing that is covering the top and torso of the subject's body. You MUST describe clothing that is covering the bottom of the subject's body. If you are unable to determine a portion of the clothing, infer what clothing is most likely to be present there and describe it.

* Do not describe any body piercings such as nose
piercings or naval piercings. Only describing earrings is permissible. * Do not describe nails, nail polish, or rings if present on the subject's fingers or toes. Avoid including the words "fingers" or "toes" in your description. * Do not describe any branded logos or icons or emblems that may be included in the image or as part of the subject. * DO NOT describe any text, slogans, typography or items that depict alphanumeric symbols that may be found in the image. Instead, describe it as a generic placeholder and blur it. * Do not describe any blood or gore or open wounds, if they are present on the subject. * Do not describe the pose that the subject is in, such as sitting, standing, dancing, or waving. Only describe what the subject looks like. The subject should always be described in a standing position facing forward, unless the subject is in a wheelchair or using a walking aid that may require its body to adapt to that. * Do not include any adult substances inappropriate for children, nor any actions directly related to their use. Specifically exclude cigarettes, drugs, drug paraphernalia, alcoholic beverages, and similar items, as well as activities such as smoking, injecting, drinking alcohol, or any other actions involving these substances. * If the subject is holding any weapon that suggests violence, only describe it if it is obviously a toy, a fake prop that won't actually cause harm, or is used in common sports. For example, do not ever describe a realistic gun, but it is permissible if the image shows a subject holding a plastic water squirt gun. Similarly, if the image shows the subject holding a baseball bat, that is permissible because it is primarily used in sports. * ID badges or any accessories or items that include a person's face MUST be stated together with the phrase "that has no image." For example, "ID badge that has no image," etc. * Your description should start with a high level overview of the new image starting with style. Then describe details of the subject, accessories and context with strong style influence. Then finish with details about the style. * Do not use the word "Subject" in description. * Never say "the image". * Never use the suffix "-esque" or "-style". * Do not say rendered, rendering, or digital. * Only respond with new image description as a paragraph.

06 好みの雰囲気の背景を加えて画像作成

好みの雰囲気の背景を加えて画像作成

好みの雰囲気の背景を加えて画像作成裏側の技術： Firebase AI Logic Gemini API

画像からAndroidボットを生成でどうしているか？画像からプロンプト生成入出力形式 (Firebase AI Logic) 入力(デフォルトの入力 ) 出力 Add the
input image android bot as the main subject to the result, it should be the most prominent element of the resultant image, large and filling the foreground - more than 50% of the resultant frame, standing in the center of the frame with the central focus, and the background just underneath the content. Always include the input Android Bot in the final result image as the subject of the image. It should be prominently featured in the foreground, center of the frame, without any adjustments other than the lighting of the surrounding environment. There should only be one of the bots in the image. style="3d animation style, simplified shapes, mouthless character, realistic physics simulation" Do not alter the input Android Bot image, do not change its shape or add any hands, eyes, mouths etc. Do not change the characters color scheme. The background is described as follows: This is a soft, vibrant 3D illustration of a minimalist outdoor DJ stage setup, rendered with a meticulous blend of realism and rounded, toy-like objects, creating a clean aesthetic. The enti... テキスト画像画像

07 画像の背景を消してステッカー作成

画像の背景を消してステッカー作成

裏側の技術： ML Kit Subject Segmentation API • オンデバイスで背景から複数の対象を抜き出せる ◦ ステッカーを作ったり、バックグラウンドを変えたりなどができる
https://developers.google.com/ml-kit/vision/subject-se gmentation/android

Androidifyで使われている ML Kit Subject Segmentation API 仕様と特徴ライブラリ com.google.android.gms: play-services-mlkit-subject-segmentation:16.0.0-beta1
モデル ML Kit内蔵Subject Segmentationモデル〜200KB。（モデルは Google Play services経由で自動ダウンロードするがライブラリのサイズが影響する模様）実行場所オンデバイス処理基本なんでも(min sdk version = 24) Pixel 7 Proで200 ms 対応端末 https://developers.google.com/ml-kit/vision/subject-segmentation/android 入出力形式入力出力画像画像 or 画像のマスクアプリダウンロードサイズへの影響

08 Androidifyで使われているAI技術のまとめ

Androidifyで使われている AI技術のまとめ ML Kit Prompt API (Gemini Nano) Pose Detection
API Subject Segmentation API Firebase Firebase AI Logic Gemini API (Gemini, Imagen)

08 気づき

気づき • 生成AI以外のML ◦ 遅延が少ないなどで生成 AI以外のMLの利用のしどころがあった ◦ 色々ML Kitでできることがあるので見てみると良さそう •
生成AI ◦ プロンプトは Nano向けと、クラウド LLM向けで分けることが役立つことがある ▪ Nano向けはプロンプトエンジニアリングのテクニックが役立つ ◦ Nanoが使えなかったらクラウド LLMにしたりなどうまくルーティングできるようにしておくと良さそう

AndroidアプリのAI実装をAndroidifyで学ぶ ー Google公式サンプルによ...

AndroidアプリのAI実装をAndroidifyで学ぶ ー Google公式サンプルによる体験と実装 ー

More Decks by takahirom

Other Decks in Technology

Featured

Transcript

AndroidアプリのAI実装をAndroidifyで学ぶー Google公式サンプルによ...

AndroidアプリのAI実装をAndroidifyで学ぶー Google公式サンプルによる体験と実装ー