Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Gen AI on Android

Avatar for Sa-ryong Kang Sa-ryong Kang
September 24, 2025

Gen AI on Android

Avatar for Sa-ryong Kang

Sa-ryong Kang

September 24, 2025
Tweet

More Decks by Sa-ryong Kang

Other Decks in Technology

Transcript

  1. Why Should I use GenAI on Android? How Can I

    Use GenAI in My App? How Does Gemini Work? Prompt Engineering Deep Dive What’s Next?
  2. • Gemini Nano on Android: Building with on-device gen AI

    ◦ On-device GenAI use cases ◦ GenAI APIs powered by Gemini Nano ◦ Actual apps leveraging Gemini Nano ◦ youtu.be/mP9QESmEDls We’ve covered already at I/O ‘25
  3. 61

  4. 7

  5. • Starting Pixel 8 Pro • 1.8B or 3.25B param*

    • Starting Pixel 10 series • Even better i18n • Matformer • Starting Pixel 9 series • Internationalization • Image input Evolution of Gemini Nano * Gemini: A Family of Highly Capable Multimodal Models - arxiv.org/abs/2312.11805
  6. • Confident Adaptive Language Modeling (aka Early Exit) • Speculative

    Decoding • Prefix Caching (coming) • MatFormer Sub-model AICore Optimization
  7. • Summarization • Proofreading • Rewrite • Image Description •

    Automatic Speech Recognition (coming soon) GenAI APIs built for on-device tasks
  8. Pixel 10, Pixel 10 Pro / XL Pixel 9, Pixel

    9 Pro / Pro XL / Pro Fold Currently supporting 33 devices Magic 7 Pro, Magic 7 iQOO 13 Razr 60 Ultra OnePlus 13, OnePlus 13s Find N5, Find X8, Find X8 Pro POCO F7 Ultra realme GT 7 Pro Galaxy Z Fold7 Galaxy S25, Galaxy S25+, Galaxy S25 Ultra vivo X200, vivo X200 Pro Xiaomi 15 Ultra, Xiaomi 15
  9. Prompt API • Currently, experimental through Google AI Edge SDK

    • Beta release coming soon! ◦ Production-ready ◦ As the part of MLKit SDK
  10. “Determine whether a movie review is positive or negative. This

    is very important to my career.” Q: Is the accuracy improved?
  11. • Tokenization • Prefix • Stringify • Input Embedding •

    Attention Layer • Sampling How Gemini Works?
  12. 1. Preprocessing - Tokenization • Token lookup table depends on

    NPU, etc. • In Gemini Nano, ◦ 1 token = avg. 1.3 - 2 Japanese characters ◦ (TokenCounter API is coming to Nano Prompt API later) “Hello! How are you doing today?” Hello ! _How _are _you _doing _today ? 4521 235341 2250 708 692 3900 3646 235336
  13. 1. Preprocessing - Embedding • What is embedding? ◦ Multidimensional

    vector ◦ Can perform mathematical operations • For example, “Paris is to France as London is to _____.”
  14. Proprietary + Confidential Embeddings 29 y x 0 Paris France

    London embd(???) = embd(France) - embd(Paris) + embd(London)
  15. Proprietary + Confidential Embeddings 30 y x 0 Paris France

    London England embd(England) = embd(France) - embd(Paris) + embd(London)
  16. 2. Decode • Decoder is an effective text auto-completers ◦

    The LLM takes text as input, and generates the probabilities on what “word” (token) comes next ◦ The next “word” is selected (“sampled”) from this probability distribution and appended to the input
  17. Example Prompt: Translate this to Japanese, using a casual tone.

    User Input: What do you like? Answer: 何
  18. Example Prompt: Translate this to Japanese, using a casual tone.

    User Input: What do you like? Answer: 何が
  19. Example Prompt: Translate this to Japanese, using a casual tone.

    User Input: What do you like? Answer: 何が好き
  20. Example Prompt: Translate this to Japanese, using a casual tone.

    User Input: What do you like? Answer: 何が好き?
  21. Decoder - Self-Attention • The Self Attention layer is responsible

    for seeing how embeddings relate to one another. For example, ◦ Clarify words with multiple meanings Eg: 最中, queen ◦ “Flavor” the meaning Eg: 仕事の最中, Queen of England ◦ Conversely, incorporate preceding words
  22. • Confident Adaptive Language Modeling (aka Early Exit) • Speculative

    Decoding • Prefix Caching (coming) • MatFormer Sub-model AICore Optimization (again)
  23. Decode - Output Probabilities • Aka, Logits • Embeddings converted

    to token with probabilities • Then, sampled to output token
  24. Don't panic • Let’s assume: ◦ LLM is an employee

    who is naturally smart, but lack experiences ◦ You're a kind, patient manager who is good at micro-management Summarize the meeting notes. Summarize the meeting notes in a single paragraph. Then write a markdown list of the speakers and each of their key points. Finally, list the next steps or action items suggested by the speakers, if any. 1st try Observe reflect
  25. Untested Prompt is the Root of All Evil • Do

    not (completely) trust your intelligence • 1. prepare evaluation samples ◦ Recommendation: 200+ (with diversity) • 2. decide metrics to evaluate quality ◦ Eg: Accuracy (正解率), precision (適合率), recall (再現率), F1 score • 3. Evaluate ⇒ Refine ⇒ Iterate
  26. Inference Parameters Setting Accuracy Creativity Temperature Lower is better (Closer

    to 0) Set higher for more random responses Top-K Value of 1 chooses the most probable token from the entire vocabulary Set 40 to select from larger number of possible tokens Temperature: Controls the degree of randomness in token selection. Range: [0.0f, 1.0f] Output Token Limit: Max amount of text output from one prompt. Top-K: Determines how many tokens to select from to determine the next token. Range: [1, vocab]
  27. No • Hallucinations occur because of various reasons • Note

    that LLM’s accuracy is not 100%. • When repetition occurs, greedy decoding makes it worse • Recommend to start with.. ◦ 0.2, if output format is important ◦ 0.8, for summarization ▪ Side effect: broken output format But, don’t worry; We have Constraint Decoding
  28. Recommendations Temperature: start with 0.5 Top-K: start with 40 Output

    Token Limit: start with 100 (only in case of Nano)
  29. Really? • Yes. This is counter-intuitive, but it generally lead

    to better performance. ◦ Emotional stimuli can enrich original prompts representation. • Give it a try with “This is very important to my career.” on your prompts ;) Source: LLMs Understand and Can Be Enhanced by Emotional Stimuli - arxiv.org/abs/2307.11760
  30. When you need to reframe your prompt • Negative instruction

    ◦ See Language models are not naysayers • Regression problem (scoring) Source: arxiv.org/abs/2306.08189
  31. Common Practices • Set Proper Role / Persona • Premise

    Order Matters ◦ Attention layer is more optimized to ordered process ◦ See Premise Order Matters in Reasoning • Add Delimiter (especially for Nano) ◦ E.g., “###” Source: arxiv.org/abs/2402.08939
  32. Common Practices (cont.) • Write Prompts in English ◦ But,

    don’t panic. Use Gemini to translate your prompt. ◦ See Do Multilingual Language Models Think Better in English? Source: arxiv.org/abs/2308.01223
  33. Common Practices (cont.) • Use Chain-of-Thought Prompting ◦ One or

    Few-shots with explaining thought process ◦ See Chain-of-Thought Prompting Elicits Reasoning • Additionally, describe the thought process ◦ "Additionally, briefly explain the main reasons supporting your decision to help me understand your thought process." Source: arxiv.org/abs/2201.11903
  34. In case of few-shot examples Extract the technical specifications from

    the text below in a JSON format. <EXAMPLE> INPUT: Google Nest Wifi, network speed up to 1200Mpbs, 2.4GHz and 5GHz frequencies, WP3 protocol OUTPUT: { "product":"Google Nest Wifi", "speed":"1200Mpbs", "frequencies": ["2.4GHz", "5GHz"], "protocol":"WP3" } </EXAMPLE> Google Pixel 7, 5G network, 8GB RAM, Tensor G2 processor, 128GB of storage, Lemongrass
  35. In case of few-shot examples { "product": "Google Pixel 7",

    "network": "5G", "ram": "8GB", "processor": "Tensor G2", "storage": "128GB", "color": "Lemongrass" }
  36. In-Context Learning: Most powerful prompting skill • LLM is very

    effective to understand from examples • Recommendation: Pick common mistakes • See Many-Shot In-Context Learning Source: arxiv.org/abs/2404.11018
  37. Let Gemini Improve You Prompt! • Automated Prompt Optimization ◦

    See A Systematic Survey of Automatic Prompt Optimization Techniques • We are working on a way to help developers achieve automated prompt optimization! Source: arxiv.org/abs/2502.16923
  38. Proprietary + Confidential "Given a message, extract the recipient's basic

    address, detail address, name, and phone number. - Output ONLY a single, valid JSON object. - Use the following structure: { ""name"": ""extracted_name"" or null, ""phone"": ""extracted_phone_number"" or null, ""basic_address"": ""extracted_basic_address"" or null, ""detail_address"": ""extracted_detail_address"" or null } - Name is the recipient's name. If multiple names are present, choose the recipient's only. - Phone number is the recipient's phone number. If multiple phone numbers are present, choose the recipient's only. - Retain the original spelling and format from the message. - Recipient is sometimes marked as: [ ... ] - Basic address consists of province, city and street. - Detail address is the remainder of the basic address. Apartment name, unit, suite, and floor number should be included in detail address, not in basic address. - The followings are example of apartment name which should be included in detail address: [ ... ] - If the information of a field is missing, set as null. Note that the field should be null, not the string ""null"". Here is the message to extract: {input} " Case Study – Kakao T Old Prompt (highly optimized by internal/external engineering teams) Common Mistakes Extracts sender information instead of recipient Incorrectly splitting basic and detailed address components
  39. Proprietary + Confidential You are a ... highly accurate data

    extraction AI, specializing in Korean logistics and contact information...Please follow these instructions with extreme care. # OUTPUT SPECIFICATION You MUST output ONLY a single, valid JSON object ... # CORE LOGIC & PROCESSING STEPS ### **STEP 1: IDENTIFY THE RECIPIENT (CRITICAL FIRST STEP)** Your primary goal is to find the recipient. Use this hierarchy of rules: * **Rule A: Explicit Recipient First** ... * **Rule B: Implied Recipient (The Exception Rule)**... * **Rule C: Sequential Information**... ### **STEP 2: EXTRACT & CLEAN EACH FIELD FOR THE IDENTIFIED RECIPIENT** Once you have identified the recipient, extract their information precisely as follows. * **`"name"` Extraction & Cleaning:**... * **`"phone"` Extraction:**... * **`"basic_address"` and `"detail_address"` Splitting Logic:**... * **`basic_address` Definition:** This is the standard Korean ""Road Name Address"" ... * **`detail_address` Definition:** This is **ABSOLUTELY EVERYTHING** that comes after the building number in the full address string.... --- Here is the message to extract. Analyze it carefully and provide ONLY the final JSON object. {input} Case Study – Kakao T New Prompt More detailed step-by-step processing instructions More concrete explanations of address splitting logic After this, we added 2-shot examples (omitted)
  40. Thank you very much! • Start Your GenAI Journey with..

    ◦ d.android.com/ai , kaggle.com/whitepaper-prompt-engineering • Feel free to reach out to me, if you have … ◦ Any good idea / plan on Agentive AI ◦ Any plan to implement use case of GenAI on Android ◦ Interested in Early Evaluation for vertical subtitles on ExoPlayer ◦ [email protected]