Getting Started with On-Device AI in Flutter - Flutter Alliance 2025

Getting Started with On-Device AI in Flutter @jaichangpark 1

2 Jai-Chang Park GDE Dart-Flutter GDG Golang Korea Flutter Seoul
@jaichangpark

@source: https://artiﬁcialanalysis.ai/?intelligence-tab=intelligence 2025-10-05 3

Flutter with AI 7 • http, dio • Firebase AI
Logic ▪ firebase_ai • https://pub.dev/packages/firebase_ai • Genkit (Firebase Genkit) ◦ https://genkit.dev/ ◦ https://flutter.dev/events/building-agentic-apps#flutter-genkit • dartantic_ai ◦ https://pub.dev/packages/dartantic_ai

9 In addition to the general terms above ("How Google
Uses Your Data" under "Unpaid Services" and "Paid Services"), when using Grounding with Google Search, Google will store prompts, contextual information that you may provide, and output for thirty (30) days for the purposes of creating Grounded Results and Search Suggestions and the stored information can be used for debugging and testing of systems that support Grounding with Google Search. When using Grounding with Google Search via paid quota of Gemini API, this processing for debugging and testing of systems is in accordance with the Data Processing Addendum for Products Where Google is a Data Processor. Data Collection & How Google Uses Your Data Gemini API

10 Category Unpaid Services Paid Services Applicability Cloud Billing Inactive
(Using free quota in AI Studio/Gemini API) Cloud Billing Active (Applies to ALL usage, including free quota) Primary Data Goal Improve & Develop Google Products and Machine Learning Technologies Service Delivery and Policy Violation Detection Used for Model Improvement? YES (Your content is used) NO (Your prompts/responses are not used) Human Review Possible (For quality improvement) Not Conducted (No review for improvement purposes) Privacy Measures (Content) Data is Disconnected from your Google Account/API Key before review. Processed according to the Data Processing Addendum (DPA) Logging & Storage Retained for development and improvement purposes. Logged for a Limited Time (Solely for violation detection/legal disclosure). User Caution DO NOT SUBMIT Sensitive or Conﬁdential Information. Usage details (tokens, billing, settings) are still collected under the Privacy Policy. EU/EEA/UK Speciﬁcs Users in these regions follow the Paid Service Rules, even when using free services. N/A @source: https://ai.google.dev/gemini-api/terms Gemini API

11 @source: https://www.anthropic.com/news/updates-to-our-consumer-terms https://help.openai.com/en/articles/5722486-how-your-data-is-used-to-improve-model-performance Anthropic & OpenAI Feature Anthropic (Consumer:
Claude Free/Pro/Max) OpenAI (Consumer: ChatGPT, Sora) OpenAI (Commercial: API, Enterprise) Default Data Use for Model Training Opt-In is required (User choice, effective Oct 2025). Opt-Out is required (Data is used unless user explicitly turns it off). Opt-Out by Default. Data is NOT used for training unless the customer explicitly opts in (e.g., providing feedback in Playground). Data Retention Period Dual Policy: Not explicitly stated for consumer content. API Abuse Monitoring Logs: 30 days (Default). ZDR/MAM: Allows exclusion of customer content from retention. 30 days (If Opt-Out) 5 years (If Opt-In, for model consistency and safety improvement). Opt-Out Exceptions 1. Data flagged for Safety Review. 1. User Feedback (Thumbs up/down). Explicit Opt-in required (e.g., providing feedback). 2. Explicit Feedback. 2. Data already used in past training runs. User Control Mechanisms Set preference in Privacy Settings. Individual conversation deletion stops future training. Privacy Portal or Settings. Temporary Chat option (no history, no training). Zero Data Retention (ZDR) and Modified Abuse Monitoring (MAM) controls available for eligible clients.

13 @jaichangpark @jaichangpark TEXT: Korean TEXT: English TEXT: Korean TEXT:
English

14 @jaichangpark @jaichangpark TEXT with IMAGE TEXT ONLY

On-Device AI 16

3 Core Components of On-Device AI On-Device AI의 3대 핵심
구성 요소 Model 모델 The Brain 학습된 지식 A trained ﬁle with pattern recognition capabilities, optimized for mobile devices. 훈련 과정을 거쳐 패턴 인식 능력을 갖춘 파일. 반드시 모바일 환경에 맞게 경량화되어야 합니다. Hardware 하드웨어 The Body 실행 환경 Hardware that executes the model, determining speed and eﬃciency. 모델이 실제 계산을 수행하는 물리적 장치. 속도와 효율성을 결정합니다. Inference Engine / Runtime 추론 엔진/런타임 The Translator 번역 및 실행기 Software layer that converts the model into hardware-executable instructions and runs inference. 모델 파일(1)을 가져와 하드웨어(2)가 이해할 수 있는 명령어로 변환하고 추론을 실행하는 소프트웨어 레이어. 17

18 Feature Samsung Galaxy S25 Google Pixel 10 Apple iPhone
17 (Pro) Apple iPhone 17(Standard) Expected Release Q1 2025 October 2025 September 2025 Processor Qualcomm Snapdragon 8 Google Tensor G5 Apple A19 Pro Apple A19 RAM 12GB - 16GB 12GB 12GB 8GB HARDWARE On-Device AI

19 MODEL

21 @source: https://ai.google.dev/gemma/docs/get_started MODEL

22 Inference/Runtime

23 ollama

25 Inference/Runtime

26 https://developers.googleblog.com/en/tensorﬂow-lite-is-now-litert/ Inference/Runtime

Google AI Edge SDK 29 https://developer.android.com/ai/gemini-nano/ai-edge-sdk

Gemini Nano 30 https://android-developers.googleblog.com/2025/05/on-device-gen-ai-apis-ml-kit-gemini-nano.html

Gemini Nano 31 Gemini Nano allows for up to 12,000
input tokens. https://android-developers.googleblog.com/2025/08/the-latest-gemini-nano-with-on-device-ml-kit-genai-apis.html

• Google AI Edge: Core APIs and tools for on-device
ML. • LiteRT: Lightweight runtime for optimized model execution. • LLM Inference API: Powering on-device Large Language Models. • Hugging Face Integration: For model discovery and download. https://github.com/google-ai-edge/gallery Google AI Edge Gallery 32

33 AFM Apple Foundation Model

34 ∼3B-parameter on-device model optimized for Apple silicon through architectural
innovations such as KV-cache sharing and 2-bit quantization-aware training; The context window size: the system model supports up to 4,096 tokens https://arxiv.org/abs/2507.13575 AFM (Apple Foundation Model)

35 https://arxiv.org/abs/2507.13575 AFM (Apple Foundation Model)

AFM (Apple Foundation Model) 36 https://arxiv.org/abs/2507.13575

How to Build On-Device AI in Flutter 38

(Recommended) Explore existing Flutter packages or create your own (추천)
플러터 패키지 탐색하기 또는 직접 개발하기 Build the UI 화면 개발하기 Download the model (locally or from cloud storage) 모델 다운로드 받기 (로컬 또는 클라우드 스토리지) Add inference functionality 추론 기능 넣기 Review the results 결과 검토하기 + + + Steps + + 39

flutter_gemma https://pub.dev/packages/flutter_gemma cactus https://github.com/cactus-compute/cactus google_ml_kit https://pub.dev/packages/google_ml_kit flutter-mediapipe https://github.com/google/flutter-mediapipe/tree/main tflite_flutter https://pub.dev/packages/tflite_flutter
+ + + Flutter Packages + + 40 STEP 1

41 .litertlm Start .gguf flutter_gemma .task cactus (v0) llama_cpp_dart .onnx
fonnx fllama flutter-mediapipe STEP 1

• Model Download Support • Multimodal Support • Function Calling
• Thinking Mode • Embedding https://pub.dev/packages/flutter_gemma https://github.com/DenisovAV/flutter_gemma/tree/main • Platform Channel, Pigeon ﬂutter_gemma Sasha Denisov(@ShuregDenisov) @DenisovAV 42 STEP 1

ﬂutter_gemma 43 STEP 1 native ﬂutter • Platform Channel ◦
Event Channel • Pigeon ◦ a code generator tool to make communication between Flutter and the host platform type-safe, easier, and faster. • mediapipe ◦ tasks-genai

cactus flutter • Language Model (LLM) ◦ Streaming Completions ◦
Function Calling (Experimental) • Embedding • VLM (v0) https://cactuscompute.com/ https://pub.dev/packages/cactus https://github.com/cactus-compute/cactus-flutter • ffi cactus Cactus Compute, Inc. 44 STEP 1

cactus Cactus Compute, Inc. 45 STEP 1

cactus 46 STEP 1 native ﬂutter • Foreign Function Interface
(FFI) • DynamicLibrary

47 STEP 2

48 STEP 2

init FlutterGemmaPlugin & modelManager ﬂutter_gemma 49 final gemma = FlutterGemmaPlugin.instance;
@override void initState() { super.initState(); modelManager = gemma.modelManager; init(); } STEP 3

model download ﬂutter_gemma 50 final spec = MobileModelManager.createInferenceSpec( name: "gemma-3n-E4B-it-litert-lm",
modelUrl: "https://huggingface.co/google/gemma-3n-E4B-it-litert-lm/resolve/main/gemma-3n-E4B-it-int 4.litertlm?download=true", ); STEP 3

model download ﬂutter_gemma 51 STEP 3 modelManager?.downloadModelWithProgress( spec, token: "<Huggingface
Token>", ) .listen( (progress) { print('Loading progress: $progress%'); }, onDone: () { print('Model loading complete.'); init(); }, onError: (error) { print('Error loading model: $error'); }, );

ensureModelReady ﬂutter_gemma 52 STEP 3 await modelManager?.ensureModelReady( "gemma-3n-E4B-it-litert-lm", "https://huggingface.co/google/gemma-3n-E4B-it-litert-lm/resolve/main/gemma-3n-E4B- it-int4.litertlm?download=true",
);

createModel ﬂutter_gemma 53 STEP 3 final model = await gemma.createModel(
modelType: ModelType.gemmaIt, preferredBackend: PreferredBackend.gpu, supportImage: true, maxTokens: 2048, maxNumImages: 1, );

createChat ﬂutter_gemma 54 STEP 3 chat = await model.createChat( modelType:
ModelType.gemmaIt, supportImage: true, supportsFunctionCalls: true, tools: _tools, isThinking: false, ); Future<InferenceChat> createChat({ double temperature = .8, int randomSeed = 1, int topK = 1, double? topP, int tokenBuffer = 256, String? loraPath, bool? supportImage, List<Tool> tools = const [], bool? supportsFunctionCalls, bool isThinking = false, // Add isThinking parameter ModelType? modelType, // Add modelType parameter })

addQueryChunk ﬂutter_gemma 55 STEP 3 final message = _selectedImageBytes !=
null ? Message.withImage( text: text.trim(), imageBytes: _selectedImageBytes!, isUser: true, ) : Message.text(text: text.trim(), isUser: true); await chat?.addQueryChunk(message);

addQueryChunk ﬂutter_gemma 56 STEP 3 final message = _selectedImageBytes !=
null ? Message.withImage( text: text.trim(), imageBytes: _selectedImageBytes!, isUser: true, ) : Message.text(text: text.trim(), isUser: true); await chat?.addQueryChunk(message);

(Sync) generateChatResponse ﬂutter_gemma 57 STEP 3 ModelResponse response = await
chat.generateChatResponse(); if (response is TextResponse) { print(response.token); }

(Async) generateChatResponseAsync ﬂutter_gemma 58 STEP 3 chat.generateChatResponseAsync().listen((ModelResponse response) { if
(response is TextResponse) { print(response.token); }, onDone: () { print('Chat stream closed'); }, onError: (error) { print('Chat error: $error'); });

ﬂutter_gemma 59 STEP 3 init init FlutterGemmaPlugin & modelManager model
download ensureModelReady createModel createChat addQueryChunk generateChatResponse generateChatResponseAsync IF MODEL EXIST False True

cactus 60 STEP 3 final lm = CactusLM(); await lm.downloadModel(downloadProcessCallback:
(progress, status, isError) {}); await lm.initializeModel(); final resp = await lm.generateCompletion( messages: [ ChatMessage(content: '', role: "system"), ChatMessage(content: '?', role: "user") ], );

(progress, status, isError) {}); await lm.initializeModel(); final resp = await lm.generateCompletion( messages: [ ChatMessage(content: '', role: "system"), ChatMessage(content: '?', role: "user") ], ); generateCompletionStream

66 ﬂutter_gemma gemma3n-E4B cactus qwen-0.6B Test Device Nothing 3a Mem:
12GB Network: Oﬀ

• RAG (Retrieval-Augmented Generation) • Text Embedding ◦ Gecko ◦
EmbeddingGemma ◦ Qwen3 Embedding • Agents (Local) Next 67

Thanks Getting Started with On-Device AI in Flutter Dream Walker
GDE Flutter @jaichangpark 68

Getting Started with On-Device AI in Flutter - ...

Getting Started with On-Device AI in Flutter - Flutter Alliance 2025

More Decks by JaiChangPark

Featured

Transcript