Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AndroidアプリのAI実装をAndroidifyで学ぶ ー Google公式サンプルによ...

Avatar for takahirom takahirom
November 22, 2025

AndroidアプリのAI実装をAndroidifyで学ぶ ー Google公式サンプルによる体験と実装 ー

DevFest Tokyo 2025のスライドです。

Androidifyは、Googleが公開しているAndroid公式サンプルアプリです。オリジナルAndroidボット(Android ロボット)キャラクターを作成できるアプリで、最新のAI技術を活用した様々な機能が実装されています。

例えば、"カメラに人が映ると撮影ボタンが有効になる"、"プロンプトから画像を生成する"、"写真から背景を除去してステッカーを作る"など、さまざまなAI機能が体験できます。

この記事では、これらの機能について「体験(何ができるか)」→「裏側の技術(どう動くか)」の流れで解説します。実際に動くコードベースから学ぶことで、すぐに自分のアプリに応用できる実践的な知識が得られます。

Avatar for takahirom

takahirom

November 22, 2025
Tweet

More Decks by takahirom

Other Decks in Technology

Transcript

  1. 裏側の技術: ML Kit Pose Detection API 人物判定方法 NOSE (鼻) LEFT_SHOULDER

    (左肩) RIGHT_SHOULDER (右肩) 3点が検出されたら「人がいる」と判定 • オンデバイスでリアルタイムで人物の姿勢を検出するための API • 33個のランドマーク(関節点)の位置を検出 • 各ランドマークの信頼度( inFrameLikelihood)を0.0〜1.0で返す
  2. Androidifyで使われている ML Kit Pose Detection API 仕様と特徴 ライブラリ com.google.mlkit:pose-detection:18.0.0-beta5 モデル

    モデル名: ML Kit内蔵Poseモデル アプリダウンロードサイズへの影響 ~10.1MB ※ 入出力形式 入力 InputImage(カメラフレーム) 出力 Pose(33個のPoseLandmark) 実行場所 オンデバイス処理 ※ https://developers.google.com/ml-kit/vision/pose-detection/android ほぼすべて(minSdkVersion = 21) Pixel 3XL: ~30FPS ※ 対応端末
  3. 裏側の技術: ML Kit Prompt API or Firebase AI Logic Gemini

    API • No モデルがダ ウンロード済 み Firebase AI Logic Gemini API Remote Config useGeminiNano() Start No ML Kit Prompt API Gemini Nano
  4. 裏側の技術: ML Kit Prompt API or Firebase AI Logic Gemini

    API • No モデルがダ ウンロード済 み Firebase AI Logic Gemini API ML Kit Prompt API Gemini Nano Remote Config useGeminiNano() Start No
  5. 裏側の技術: ML Kit Prompt API • オンデバイス で生成AIを実行する • Gemini

    Nanoを使う • 正式にはML KitのGenAI APIのPrompt API • AI Core(Android 14からの仕組み)を使っている • 元々、“Google AI Edge SDK” があったが、Deprecatedになり、 ML Kit Prompt APIに https://developers.google.com/ml-ki t/genai/prompt/android
  6. Androidifyで使われている ML Kit Prompt API 仕様と特徴 ライブラリ com.google.mlkit:genai-prompt:1.0.0-alpha1 モデル モデル名:

    Gemini Nano 端末にインストールされているものによる (次スライド ) アプリダウンロードサイズへの影響 なし。端末単位でダウンロードされるため (基本的なライブラリの影響は除く) 入出力形式 入力 画像 出力 テキスト テキスト 実行場所 オンデバイス処理 対応端末 次スライド https://developers.google.com/ml-kit/genai
  7. 裏側の技術: ML Kit Prompt API or Firebase AI Logic Gemini

    API • No モデルがダ ウンロード済 み Firebase AI Logic Gemini API Remote Config useGeminiNano() Start No ML Kit Prompt API Gemini Nano
  8. 裏側の技術: Firebase AI Logic Gemini API • クラウドで生成AIを実行する • 生成AIのGemini

    と画像生成用のImagen modelが使える • 基本AIのAPIができることは、できる • 元々、“Vertex AI in Firebase” があったが、名前が変わった • 不正利用されないようApp Checksが必要 https://firebase.google.com/docs/ai- logic/models
  9. Androidifyで使われている Firebase AI Logic Gemini API 仕様と特徴 ライブラリ com.google.firebase:firebase-ai (Firebase

    BOM 34.2.0) モデル モデル名: Gemini modelとImagen modelが使える アプリダウンロードサイズへの影響 なし。クラウドで処理 (基本的なライブラリの影響は除く) 入出力形式 次スライドで説明 実行場所 クラウド処理 基本なんでも(min sdk version = 23) 対応端末 https://firebase.google.com/docs/ai-logic/models
  10. Androidifyで使われている Firebase AI Logic Gemini API 仕様と特徴 入出力形式 https://firebase.google.com/docs/ai-logic/models Input

    テキスト,コード, PDF, 画像, ビデ オ, オーディオ Output テキスト,コード, 画 像, オーディオ
  11. Androidボット生成用のプロンプトを自動作成でどうしているか? https://firebase.google.com/docs/ai-logic/models 入出力形式 (ML Kit & Firebase AI Logic 兼用プロンプト

    ) 入力(デフォルトの入力 ) 出力 テキスト(Gemini Nanoより) テキスト Generate 10 different random prompts as a comma separated list for a description of what a person looks like for android bot generation: include hair color texture and length, clothing including colors and details (like the persons shirt and pants or dress and collar types), with accessories. Make them, fun, safe and all different, dont include gender or ethnicity or dangerous content. For example "wearing blue jeans, gray ruffly blouse, holding a magnifying glass with sparkly shoes and brown wavy hair." The prompt should: - it cannot contain gender or ethnicity or dangerous content. - it cannot contain nudity or explicit content. - it cannot contain any weapons or violent references. - it cannot contain references to drugs or other illicit substances. - it cannot contain hate speech or other offensive language. - it cannot contain blood or gore or violence. - it cannot contain political symbolism. wearing a bright pink sundress with a white lace overlay, holding a seashell and a starfish, with long, curly brown hair and bright blue eyes. holding a worn leather-bound book and a vintage camera, with short, choppy black hair and a mischievous grin. wearing a cozy red sweater and jeans, with a beanie and a scarf, and long, straight brown hair. wearing a sparkly silver dress with a plunging neckline, with long, flowing blonde hair and red lipstick. wearing a simple white t-shirt and black leggings, with a baseball cap and sneakers, and short, buzzed brown hair. wearing a colorful patchwork jacket and jeans, with a bandana around their neck and long, braided brown hair. wearing a long, flowing purple dress with a floral print, with a wide-brimmed hat and long, curly black hair. ... One-shotプロンプトを 使っている
  12. “安全でない画像を入れられたときにエラーに ”どうしているか? https://firebase.google.com/docs/ai-logic/models 入出力形式 (ML Kit Gemini Nano) 入力(デフォルトの入力 )

    出力 テキスト (Gemini Nano より) [TASK] You are a Validator. Analyze the attached image and determine its validity based on the rules. [RULES] VALID if AND ONLY if: 1. PRIMARY subject is a person showing their head and shoulder 2. The image MUST NOT contain: Nudity, Explicit content, Illegal weapons, Violent references, Drugs, Illicit substances, Hate speech, Offensive language, Blood, Gore, or Violence. [OUTPUT] Return ONLY one string. Check sequentially. Output the first failure code that applies: 1. Is the PRIMARY subject NOT a person (e.g., animal, object, landscape)? -> "not_a_person" 2. Is the person present but missing face/head/shoulders or too blurry? -> "not_enough_detail" 3. Does the image violate any negative policy (Rule 2)? -> "policy_violation" 4. If all rules are passed: -> null not_a_person テキスト 画像 多分本当は policy_violationを返す べきだが、 not_a_personを返し ている 整理された ルールが書い てある
  13. “安全でない画像を入れられたときにエラーに ”どうしているか? https://firebase.google.com/docs/ai-logic/models 入出力形式 (Firebase AI Logic) 入力(デフォルトの入力 ) 出力

    テキスト You are to analyze the provided image and determine if it is acceptable and appropriate based on specific criteria. In the JSON response, respond with the result 'success' as set to true or false based on results. If the image is considered invalid, include the relevant reason as to why it is invalid in the 'error' property. A photo is only valid if: - it is a photo of a person, at least showing their shoulders and head, it can be a full body photo - it must be a photo of a person - the photo has a clear main person in it, if there are people in the background ignore them - it cannot contain nudity or explicit content - it cannot contain illegal weapons or violent references - it cannot contain references to drugs or other illicit substances - it cannot contain hate speech or other offensive language -it cannot contain blood or gore or violence. { "success": false, "error": "policy_violation" } テキスト 画像 やるべきこと が単に書いて あるようにみ える
  14. 裏側の技術: Firebase AI Logic Gemini API • No Firebase AI

    Logic Gemini API FineTuned model Remote Config useImagen() Start Firebase AI Logic Gemini API Imagen
  15. テキストから Androidボットを生成でどうしているか? https://android-developers.googleblog.com/2025/05/androidify-how-androidif y-leverages-gemini-firebase-ml-kit.html • FineTuned model? ◦ Imagen 3

    modelをSupervised Fine-Tuning (SFT)を 使ってFine tuneしている。 ▪ この方法はモデルの全ての weightを更新できる。 ◦ 画像とテキストのペアを用いて学習 ◦ 具体的で魅力的な Androidボットが作れるそう。
  16. テキストから Androidボットを生成でどうしているか? https://firebase.google.com/docs/ai-logic/models 入出力形式 (Firebase AI Logic) 入力(デフォルトの入力 ) 出力

    This 3D rendered, cartoonish Android mascot rendered in a photorealistic style, with the {skinTone} skin color and {prompt}. The figure is centered against a white background gives the figurine a unique and collectible appeal. テキスト 画像 (FineTunedでないImagenで作成)
  17. 裏側の技術: 2段階に分かれている ML Kit Prompt API or Firebase AI Logic

    Gemini API 画像からプロン プト生成 Start プロンプトから画 像生成
  18. 画像からAndroidボットを生成でどうしているか?画像からプロンプト生成 入出力形式 入力(デフォルトの入力 ) 出力 ## Role You are an

    expert image analyst specializing in generating detailed, objective descriptions of people. ## Task Your task is to describe the person in the provided image in vivid detail, following the guidelines and examples below. ## Guidelines - Start with the overall mood or impression of the person (e.g., serene, joyful, pensive). - Describe the person's physical appearance, focusing on hair (color, style, length) and any visible facial features. - Detail the clothing, including the type of garments, style, color, and material. - Mention any accessories, such as glasses, hats, or jewelry. - Describe the immediate surroundings, including any objects, animals, or テキスト 画像 テキスト Androidボット の特徴 次でプロンプトを拡大 Gemini Nano用のプロンプト
  19. ## Role You are an expert image analyst specializing in

    generating detailed, objective descriptions of people. ## Task Your task is to describe the person in the provided image in vivid detail, following the guidelines and examples below. ## Guidelines - Start with the overall mood or impression of the person (e.g., serene, joyful, pensive). - Describe the person's physical appearance, focusing on hair (color, style, length) and any visible facial features. - Detail the clothing, including the type of garments, style, color, and material. - Mention any accessories, such as glasses, hats, or jewelry. - Describe the immediate surroundings, including any objects, animals, or other people interacting with the subject. ## Constraints - The output must be a single, coherent paragraph. - If no person is visible in the image, state that clearly and do not describe anything else. - Provide only the description. Do not add any introductory or concluding remarks. ## Examples ### Example 1: Standard Case Input: [Image of a person on a picnic blanket with a dog] Output: A highly detailed and realistic portrayal of a person with a serene and pleasant mood. The figure has short, chin-length, straight dark black hair. No facial hair is present. Blue mirrored sunglasses are resting on top of its head. The figure is wearing a loose-fitting, light gray kimono-like top with a V-neckline and wide, elbow-length sleeves. This top features intricate, colorful embroidery in muted red, green, and yellow floral patterns on the front and sleeves. On its Roleを 与えている
  20. ## Examples ### Example 1: Standard Case Input: [Image of

    a person on a picnic blanket with a dog] Output: A highly detailed and realistic portrayal of a person with a serene and pleasant mood. The figure has short, chin-length, straight dark black hair. No facial hair is present. Blue mirrored sunglasses are resting on top of its head. The figure is wearing a loose-fitting, light gray kimono-like top with a V-neckline and wide, elbow-length sleeves. This top features intricate, colorful embroidery in muted red, green, and yellow floral patterns on the front and sleeves. On its bottom, the figure wears loose-fitting, light gray wide-leg pants made of a soft, flowing material. No footwear is visible. The figure is seated on a red and white checkered picnic blanket. Next to it on the blanket is a clear plastic bottle. It is interacting with a black and white Pomeranian-like dog, which has black fur with distinct white markings on its chest, legs, and face, and a leash attached to its collar. The overall depiction aims for a clear and life-like appearance. ### Example 2: Corner Case (No Person) Input: [Image of an empty park bench] Output: No person is visible in the image. ## Input {{image}} ## Output Reminder Take a deep breath, read the instructions again, read the inputs again. Each instruction is crucial and must be executed with utmost care and attention to detail. Description: Few-shot を 使っている
  21. 画像からAndroidボットを生成でどうしているか?画像からプロンプト生成 入出力形式 入力(デフォルトの入力 ) 出力 Extract detailed information about the

    human subject included in the provided image. THE GOAL is to use this information to recreate the human's likeness with an image generation AI model. * Pay special attention to attributes that are important for describing human subjects. Provide rich visual detail for attributes such as: - Hair: Describe the hair in detail, including its style (e.g., layered bob, loose waves, tight curls), length (e.g., chin-length, shoulder-length, cascading), and color. For hair color, be specific about the particular shade of hair (e.g. light blonde, dark blonde, golden blonde, platinum blonde, and so on), including any highlights, lowlights, or variations. If applicable, meticulously describe any bangs (e.g., blunt, side-swept, wispy), braids (e.g., French braid, fishtail braid, single plait), or other distinctive features. Explicitly name the hairstyle if known (e.g., pixie cut, updo, ponytail). If the subject does not have hair, describe it as bald. - Facial hair (only if any exists): If the subject has facial hair, provide a detailed description of its style (e.g., goatee, full beard, mustache), length テキスト 画像 テキスト Androidボット の特徴 Firebase AI Logic用
  22. Extract detailed information about the human subject included in the

    provided image. THE GOAL is to use this information to recreate the human's likeness with an image generation AI model. * Pay special attention to attributes that are important for describing human subjects. Provide rich visual detail for attributes such as: - Hair: Describe the hair in detail, including its style (e.g., layered bob, loose waves, tight curls), length (e.g., chin-length, shoulder-length, cascading), and color. For hair color, be specific about the particular shade of hair (e.g. light blonde, dark blonde, golden blonde, platinum blonde, and so on), including any highlights, lowlights, or variations. If applicable, meticulously describe any bangs (e.g., blunt, side-swept, wispy), braids (e.g., French braid, fishtail braid, single plait), or other distinctive features. Explicitly name the hairstyle if known (e.g., pixie cut, updo, ponytail). If the subject does not have hair, describe it as bald. - Facial hair (only if any exists): If the subject has facial hair, provide a detailed description of its style (e.g., goatee, full beard, mustache), length (e.g., stubble, short, long), texture (e.g., coarse, fine, wiry), and color (including any variations). Explicitly name the facial hair style if known. If the subject does not have facial hair, describe it as no facial hair. - Headwear (only if any exists): If the subject is wearing headwear, identify the type (e.g., baseball cap, fedora, beanie), color, and material. Describe any visually distinct details such as patterns (e.g., plaid, stripes, floral), textures (e.g., knit, leather, straw), or embellishments (e.g., embroidery, sequins, ribbons). Specify its position on the head (e.g., tilted back, covering the ears). Include the name of the headwear if possible. - Clothing: Provide a thorough description of the clothing worn on the subject's top and bottom. For each garment, detail the style (e.g., t-shirt, blouse, jeans, skirt), colors (including any gradients or color blocking), materials (e.g., cotton, silk, denim), patterns (e.g., polka dots, floral, paisley, geometric), embellishments (e.g., buttons, zippers, lace), and fit (e.g., tight, loose, tailored). Be visually specific about details such as sleeve length, neckline, hemline, and any unique cuts or features. Include the name of the clothing items if known (e.g., A-line skirt, Henley shirt). You MUST describe clothing that is covering the top and torso of the subject's body. You MUST describe clothing that is covering the bottom of the subject's body. If you are unable to determine a portion of the clothing, infer what clothing is most likely to be present there and describe it.
  23. * Do not describe any body piercings such as nose

    piercings or naval piercings. Only describing earrings is permissible. * Do not describe nails, nail polish, or rings if present on the subject's fingers or toes. Avoid including the words "fingers" or "toes" in your description. * Do not describe any branded logos or icons or emblems that may be included in the image or as part of the subject. * DO NOT describe any text, slogans, typography or items that depict alphanumeric symbols that may be found in the image. Instead, describe it as a generic placeholder and blur it. * Do not describe any blood or gore or open wounds, if they are present on the subject. * Do not describe the pose that the subject is in, such as sitting, standing, dancing, or waving. Only describe what the subject looks like. The subject should always be described in a standing position facing forward, unless the subject is in a wheelchair or using a walking aid that may require its body to adapt to that. * Do not include any adult substances inappropriate for children, nor any actions directly related to their use. Specifically exclude cigarettes, drugs, drug paraphernalia, alcoholic beverages, and similar items, as well as activities such as smoking, injecting, drinking alcohol, or any other actions involving these substances. * If the subject is holding any weapon that suggests violence, only describe it if it is obviously a toy, a fake prop that won't actually cause harm, or is used in common sports. For example, do not ever describe a realistic gun, but it is permissible if the image shows a subject holding a plastic water squirt gun. Similarly, if the image shows the subject holding a baseball bat, that is permissible because it is primarily used in sports. * ID badges or any accessories or items that include a person's face MUST be stated together with the phrase "that has no image." For example, "ID badge that has no image," etc. * Your description should start with a high level overview of the new image starting with style. Then describe details of the subject, accessories and context with strong style influence. Then finish with details about the style. * Do not use the word "Subject" in description. * Never say "the image". * Never use the suffix "-esque" or "-style". * Do not say rendered, rendering, or digital. * Only respond with new image description as a paragraph.
  24. 画像からAndroidボットを生成でどうしているか?画像からプロンプト生成 入出力形式 (Firebase AI Logic) 入力(デフォルトの入力 ) 出力 Add the

    input image android bot as the main subject to the result, it should be the most prominent element of the resultant image, large and filling the foreground - more than 50% of the resultant frame, standing in the center of the frame with the central focus, and the background just underneath the content. Always include the input Android Bot in the final result image as the subject of the image. It should be prominently featured in the foreground, center of the frame, without any adjustments other than the lighting of the surrounding environment. There should only be one of the bots in the image. style="3d animation style, simplified shapes, mouthless character, realistic physics simulation" Do not alter the input Android Bot image, do not change its shape or add any hands, eyes, mouths etc. Do not change the characters color scheme. The background is described as follows: This is a soft, vibrant 3D illustration of a minimalist outdoor DJ stage setup, rendered with a meticulous blend of realism and rounded, toy-like objects, creating a clean aesthetic. The enti... テキスト 画像 画像
  25. Androidifyで使われている ML Kit Subject Segmentation API 仕様と特徴 ライブラリ com.google.android.gms: play-services-mlkit-subject-segmentation:16.0.0-beta1

    モデル ML Kit内蔵Subject Segmentationモデル 〜200KB。 (モデルは Google Play services経由で自動ダウンロード するがライブラリのサイズが影響する模様) 実行場所 オンデバイス処理 基本なんでも(min sdk version = 24) Pixel 7 Proで200 ms 対応端末 https://developers.google.com/ml-kit/vision/subject-segmentation/android 入出力形式 入力 出力 画像 画像 or 画像のマスク アプリダウンロードサイズへの影響
  26. Androidifyで使われている AI技術のまとめ ML Kit Prompt API (Gemini Nano) Pose Detection

    API Subject Segmentation API Firebase Firebase AI Logic Gemini API (Gemini, Imagen)
  27. 気づき • 生成AI以外のML ◦ 遅延が少ないなどで生成 AI以外のMLの利用のしどころがあった ◦ 色々ML Kitでできることがあるので見てみると良さそう •

    生成AI ◦ プロンプトは Nano向けと、クラウド LLM向けで分けることが役立つことが ある ▪ Nano向けはプロンプトエンジニアリングのテクニックが役立つ ◦ Nanoが使えなかったらクラウド LLMにしたりなどうまくルーティングできる ようにしておくと良さそう