Gen AI on Android

Gen AI for Android Developers Sa-ryong Kang Developer Relations Engineer

Why Should I use GenAI on Android? How Can I
Use GenAI in My App? How Does Gemini Work? Prompt Engineering Deep Dive What’s Next?

Why Should I Use GenAI on Android?

• Gemini Nano on Android: Building with on-device gen AI
◦ On-device GenAI use cases ◦ GenAI APIs powered by Gemini Nano ◦ Actual apps leveraging Gemini Nano ◦ youtu.be/mP9QESmEDls We’ve covered already at I/O ‘25

Let’s see the real example

How Can I Use GenAI in My App?

The most efficient On-Device AI model on Android

• Starting Pixel 8 Pro • 1.8B or 3.25B param*
• Starting Pixel 10 series • Even better i18n • Matformer • Starting Pixel 9 series • Internationalization • Image input Evolution of Gemini Nano * Gemini: A Family of Highly Capable Multimodal Models - arxiv.org/abs/2312.11805

New • Performance & efficiency: Per Layer Embeddings, etc. •
Many-in-1 Flexibility: MatFormer

• Confident Adaptive Language Modeling (aka Early Exit) • Speculative
Decoding • Prefix Caching (coming) • MatFormer Sub-model AICore Optimization

• Summarization • Proofreading • Rewrite • Image Description •
Automatic Speech Recognition (coming soon) GenAI APIs built for on-device tasks

Pixel 10, Pixel 10 Pro / XL Pixel 9, Pixel
9 Pro / Pro XL / Pro Fold Currently supporting 33 devices Magic 7 Pro, Magic 7 iQOO 13 Razr 60 Ultra OnePlus 13, OnePlus 13s Find N5, Find X8, Find X8 Pro POCO F7 Ultra realme GT 7 Pro Galaxy Z Fold7 Galaxy S25, Galaxy S25+, Galaxy S25 Ultra vivo X200, vivo X200 Pro Xiaomi 15 Ultra, Xiaomi 15

Prompt API • Currently, experimental through Google AI Edge SDK
• Beta release coming soon! ◦ Production-ready ◦ As the part of MLKit SDK

Want to experiment with open models?

Gemma 3n + MediaPipe LLM Inference API

Want to use the power of cloud-based Gemini?

Firebase AI Logic Firebase SDKs Gemini API in Vertex AI
Gemini Developer API

How Does Gemini Work?

Quick Quiz

Original Prompt: “Determine whether a movie review is positive or
negative.”

“Determine whether a movie review is positive or negative. This
is very important to my career.” Q: Is the accuracy improved?

The answer is yes. LLM thinks differently than humans.

• Tokenization • Prefix • Stringify • Input Embedding •
Attention Layer • Sampling How Gemini Works?

1. Preprocessing - Tokenization • Token lookup table depends on
NPU, etc. • In Gemini Nano, ◦ 1 token = avg. 1.3 - 2 Japanese characters ◦ (TokenCounter API is coming to Nano Prompt API later) “Hello! How are you doing today?” Hello ! _How _are _you _doing _today ? 4521 235341 2250 708 692 3900 3646 235336

1. Preprocessing - Embedding • What is embedding? ◦ Multidimensional
vector ◦ Can perform mathematical operations • For example, “Paris is to France as London is to _____.”

Proprietary + Conﬁdential Embeddings 28 y x 0 Paris France
London

London embd(???) = embd(France) - embd(Paris) + embd(London)

London England embd(England) = embd(France) - embd(Paris) + embd(London)

2. Decode • Decoder is an effective text auto-completers ◦
The LLM takes text as input, and generates the probabilities on what “word” (token) comes next ◦ The next “word” is selected (“sampled”) from this probability distribution and appended to the input

Example Prompt: Translate this to Japanese, using a casual tone.
User Input: What do you like? Answer:

User Input: What do you like? Answer: 何

User Input: What do you like? Answer: 何が

User Input: What do you like? Answer: 何が好き

User Input: What do you like? Answer: 何が好き？

Decoder - Self-Attention • The Self Attention layer is responsible
for seeing how embeddings relate to one another. For example, ◦ Clarify words with multiple meanings Eg: 最中, queen ◦ “Flavor” the meaning Eg: 仕事の最中, Queen of England ◦ Conversely, incorporate preceding words

• Confident Adaptive Language Modeling (aka Early Exit) • Speculative
Decoding • Prefix Caching (coming) • MatFormer Sub-model AICore Optimization (again)

Decode - Output Probabilities • Aka, Logits • Embeddings converted
to token with probabilities • Then, sampled to output token

Prompt Engineering for Android Developers

Tip 0: Start small

Don't panic • Let’s assume: ◦ LLM is an employee
who is naturally smart, but lack experiences ◦ You're a kind, patient manager who is good at micro-management Summarize the meeting notes. Summarize the meeting notes in a single paragraph. Then write a markdown list of the speakers and each of their key points. Finally, list the next steps or action items suggested by the speakers, if any. 1st try Observe reflect

Tip 1: Test, test, and test!

Untested Prompt is the Root of All Evil • Do
not (completely) trust your intelligence • 1. prepare evaluation samples ◦ Recommendation: 200+ (with diversity) • 2. decide metrics to evaluate quality ◦ Eg: Accuracy (正解率), precision (適合率), recall (再現率), F1 score • 3. Evaluate ⇒ Refine ⇒ Iterate

Tip 2: Tune Your Configuration

Inference Parameters Setting Accuracy Creativity Temperature Lower is better (Closer
to 0) Set higher for more random responses Top-K Value of 1 chooses the most probable token from the entire vocabulary Set 40 to select from larger number of possible tokens Temperature: Controls the degree of randomness in token selection. Range: [0.0f, 1.0f] Output Token Limit: Max amount of text output from one prompt. Top-K: Determines how many tokens to select from to determine the next token. Range: [1, vocab]

Temperature • A metaphor borrowed from thermodynamics • Temperature controls
the degree of randomness in token selection

OK. Then I'd set it 0 to avoid hallucination

No • Hallucinations occur because of various reasons • Note
that LLM’s accuracy is not 100%. • When repetition occurs, greedy decoding makes it worse • Recommend to start with.. ◦ 0.2, if output format is important ◦ 0.8, for summarization ▪ Side effect: broken output format But, don’t worry; We have Constraint Decoding

Recommendations Temperature: start with 0.5 Top-K: start with 40 Output
Token Limit: start with 100 (only in case of Nano)

Tip 3: Enhance by emotional stimuli

Really? • Yes. This is counter-intuitive, but it generally lead
to better performance. ◦ Emotional stimuli can enrich original prompts representation. • Give it a try with “This is very important to my career.” on your prompts ;) Source: LLMs Understand and Can Be Enhanced by Emotional Stimuli - arxiv.org/abs/2307.11760

Tip 4: Reframe your instruction to understand you effectively

When you need to reframe your prompt • Negative instruction
◦ See Language models are not naysayers • Regression problem (scoring) Source: arxiv.org/abs/2306.08189

Tip 5: Consider Common Practices

Common Practices • Set Proper Role / Persona • Premise
Order Matters ◦ Attention layer is more optimized to ordered process ◦ See Premise Order Matters in Reasoning • Add Delimiter (especially for Nano) ◦ E.g., “###” Source: arxiv.org/abs/2402.08939

Common Practices (cont.) • Write Prompts in English ◦ But,
don’t panic. Use Gemini to translate your prompt. ◦ See Do Multilingual Language Models Think Better in English? Source: arxiv.org/abs/2308.01223

Common Practices (cont.) • Use Chain-of-Thought Prompting ◦ One or
Few-shots with explaining thought process ◦ See Chain-of-Thought Prompting Elicits Reasoning • Additionally, describe the thought process ◦ "Additionally, briefly explain the main reasons supporting your decision to help me understand your thought process." Source: arxiv.org/abs/2201.11903

Last, but Most Important Tip: Use Try Many-shots Examples

Why? • LLM is very effective to understand from examples

In case of few-shot examples Extract the technical specifications from
the text below in a JSON format. <EXAMPLE> INPUT: Google Nest Wifi, network speed up to 1200Mpbs, 2.4GHz and 5GHz frequencies, WP3 protocol OUTPUT: { "product":"Google Nest Wifi", "speed":"1200Mpbs", "frequencies": ["2.4GHz", "5GHz"], "protocol":"WP3" } </EXAMPLE> Google Pixel 7, 5G network, 8GB RAM, Tensor G2 processor, 128GB of storage, Lemongrass

In case of few-shot examples { "product": "Google Pixel 7",
"network": "5G", "ram": "8GB", "processor": "Tensor G2", "storage": "128GB", "color": "Lemongrass" }

In-Context Learning: Most powerful prompting skill • LLM is very
effective to understand from examples • Recommendation: Pick common mistakes • See Many-Shot In-Context Learning Source: arxiv.org/abs/2404.11018

Sound good. But, should I do that manually?

Let Gemini Improve You Prompt! • Automated Prompt Optimization ◦
See A Systematic Survey of Automatic Prompt Optimization Techniques • We are working on a way to help developers achieve automated prompt optimization! Source: arxiv.org/abs/2502.16923

Let’s Revisit Kakao T’s Use Case

Proprietary + Conﬁdential "Given a message, extract the recipient's basic
address, detail address, name, and phone number. - Output ONLY a single, valid JSON object. - Use the following structure: { ""name"": ""extracted_name"" or null, ""phone"": ""extracted_phone_number"" or null, ""basic_address"": ""extracted_basic_address"" or null, ""detail_address"": ""extracted_detail_address"" or null } - Name is the recipient's name. If multiple names are present, choose the recipient's only. - Phone number is the recipient's phone number. If multiple phone numbers are present, choose the recipient's only. - Retain the original spelling and format from the message. - Recipient is sometimes marked as: [ ... ] - Basic address consists of province, city and street. - Detail address is the remainder of the basic address. Apartment name, unit, suite, and floor number should be included in detail address, not in basic address. - The followings are example of apartment name which should be included in detail address: [ ... ] - If the information of a field is missing, set as null. Note that the field should be null, not the string ""null"". Here is the message to extract: {input} " Case Study – Kakao T Old Prompt (highly optimized by internal/external engineering teams) Common Mistakes Extracts sender information instead of recipient Incorrectly splitting basic and detailed address components

Proprietary + Conﬁdential You are a ... highly accurate data
extraction AI, specializing in Korean logistics and contact information...Please follow these instructions with extreme care. # OUTPUT SPECIFICATION You MUST output ONLY a single, valid JSON object ... # CORE LOGIC & PROCESSING STEPS ### **STEP 1: IDENTIFY THE RECIPIENT (CRITICAL FIRST STEP)** Your primary goal is to find the recipient. Use this hierarchy of rules: * **Rule A: Explicit Recipient First** ... * **Rule B: Implied Recipient (The Exception Rule)**... * **Rule C: Sequential Information**... ### **STEP 2: EXTRACT & CLEAN EACH FIELD FOR THE IDENTIFIED RECIPIENT** Once you have identified the recipient, extract their information precisely as follows. * **`"name"` Extraction & Cleaning:**... * **`"phone"` Extraction:**... * **`"basic_address"` and `"detail_address"` Splitting Logic:**... * **`basic_address` Definition:** This is the standard Korean ""Road Name Address"" ... * **`detail_address` Definition:** This is **ABSOLUTELY EVERYTHING** that comes after the building number in the full address string.... --- Here is the message to extract. Analyze it carefully and provide ONLY the final JSON object. {input} Case Study – Kakao T New Prompt More detailed step-by-step processing instructions More concrete explanations of address splitting logic After this, we added 2-shot examples (omitted)

What’s Next?

What’s Next? • Android’s answer to MCP? マジック・サジェスト (Magic Cue)

Thank you very much! • Start Your GenAI Journey with..
◦ d.android.com/ai , kaggle.com/whitepaper-prompt-engineering • Feel free to reach out to me, if you have … ◦ Any good idea / plan on Agentive AI ◦ Any plan to implement use case of GenAI on Android ◦ Interested in Early Evaluation for vertical subtitles on ExoPlayer ◦ [email protected]

Gen AI on Android

Gen AI on Android

More Decks by Sa-ryong Kang

Other Decks in Technology

Featured

Transcript