Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Getting Started with On-Device AI in Flutter - ...

Avatar for JaiChangPark JaiChangPark
October 21, 2025
76

Getting Started with On-Device AI in Flutter - Flutter Alliance 2025

Flutter Alliance 2025
Getting Started with On-Device AI in Flutter
박제창 @jaichangpark

Avatar for JaiChangPark

JaiChangPark

October 21, 2025
Tweet

More Decks by JaiChangPark

Transcript

  1. 4

  2. 5

  3. Flutter with AI 7 • http, dio • Firebase AI

    Logic ▪ firebase_ai • https://pub.dev/packages/firebase_ai • Genkit (Firebase Genkit) ◦ https://genkit.dev/ ◦ https://flutter.dev/events/building-agentic-apps#flutter-genkit • dartantic_ai ◦ https://pub.dev/packages/dartantic_ai
  4. 8

  5. 9 In addition to the general terms above ("How Google

    Uses Your Data" under "Unpaid Services" and "Paid Services"), when using Grounding with Google Search, Google will store prompts, contextual information that you may provide, and output for thirty (30) days for the purposes of creating Grounded Results and Search Suggestions and the stored information can be used for debugging and testing of systems that support Grounding with Google Search. When using Grounding with Google Search via paid quota of Gemini API, this processing for debugging and testing of systems is in accordance with the Data Processing Addendum for Products Where Google is a Data Processor. Data Collection & How Google Uses Your Data Gemini API
  6. 10 Category Unpaid Services Paid Services Applicability Cloud Billing Inactive

    (Using free quota in AI Studio/Gemini API) Cloud Billing Active (Applies to ALL usage, including free quota) Primary Data Goal Improve & Develop Google Products and Machine Learning Technologies Service Delivery and Policy Violation Detection Used for Model Improvement? YES (Your content is used) NO (Your prompts/responses are not used) Human Review Possible (For quality improvement) Not Conducted (No review for improvement purposes) Privacy Measures (Content) Data is Disconnected from your Google Account/API Key before review. Processed according to the Data Processing Addendum (DPA) Logging & Storage Retained for development and improvement purposes. Logged for a Limited Time (Solely for violation detection/legal disclosure). User Caution DO NOT SUBMIT Sensitive or Confidential Information. Usage details (tokens, billing, settings) are still collected under the Privacy Policy. EU/EEA/UK Specifics Users in these regions follow the Paid Service Rules, even when using free services. N/A @source: https://ai.google.dev/gemini-api/terms Gemini API
  7. 11 @source: https://www.anthropic.com/news/updates-to-our-consumer-terms https://help.openai.com/en/articles/5722486-how-your-data-is-used-to-improve-model-performance Anthropic & OpenAI Feature Anthropic (Consumer:

    Claude Free/Pro/Max) OpenAI (Consumer: ChatGPT, Sora) OpenAI (Commercial: API, Enterprise) Default Data Use for Model Training Opt-In is required (User choice, effective Oct 2025). Opt-Out is required (Data is used unless user explicitly turns it off). Opt-Out by Default. Data is NOT used for training unless the customer explicitly opts in (e.g., providing feedback in Playground). Data Retention Period Dual Policy: Not explicitly stated for consumer content. API Abuse Monitoring Logs: 30 days (Default). ZDR/MAM: Allows exclusion of customer content from retention. 30 days (If Opt-Out) 5 years (If Opt-In, for model consistency and safety improvement). Opt-Out Exceptions 1. Data flagged for Safety Review. 1. User Feedback (Thumbs up/down). Explicit Opt-in required (e.g., providing feedback). 2. Explicit Feedback. 2. Data already used in past training runs. User Control Mechanisms Set preference in Privacy Settings. Individual conversation deletion stops future training. Privacy Portal or Settings. Temporary Chat option (no history, no training). Zero Data Retention (ZDR) and Modified Abuse Monitoring (MAM) controls available for eligible clients.
  8. 12

  9. 15

  10. 3 Core Components of On-Device AI On-Device AI의 3대 핵심

    구성 요소 Model 모델 The Brain 학습된 지식 A trained file with pattern recognition capabilities, optimized for mobile devices. 훈련 과정을 거쳐 패턴 인식 능력을 갖춘 파일. 반드시 모바일 환경에 맞게 경량화되어야 합니다. Hardware 하드웨어 The Body 실행 환경 Hardware that executes the model, determining speed and efficiency. 모델이 실제 계산을 수행하는 물리적 장치. 속도와 효율성을 결정합니다. Inference Engine / Runtime 추론 엔진/런타임 The Translator 번역 및 실행기 Software layer that converts the model into hardware-executable instructions and runs inference. 모델 파일(1)을 가져와 하드웨어(2)가 이해할 수 있는 명령어로 변환하고 추론을 실행하는 소프트웨어 레이어. 17
  11. 18 Feature Samsung Galaxy S25 Google Pixel 10 Apple iPhone

    17 (Pro) Apple iPhone 17(Standard) Expected Release Q1 2025 October 2025 September 2025 Processor Qualcomm Snapdragon 8 Google Tensor G5 Apple A19 Pro Apple A19 RAM 12GB - 16GB 12GB 12GB 8GB HARDWARE On-Device AI
  12. 20

  13. 24

  14. 27

  15. 28

  16. Gemini Nano 31 Gemini Nano allows for up to 12,000

    input tokens. https://android-developers.googleblog.com/2025/08/the-latest-gemini-nano-with-on-device-ml-kit-genai-apis.html
  17. • Google AI Edge: Core APIs and tools for on-device

    ML. • LiteRT: Lightweight runtime for optimized model execution. • LLM Inference API: Powering on-device Large Language Models. • Hugging Face Integration: For model discovery and download. https://github.com/google-ai-edge/gallery Google AI Edge Gallery 32
  18. 34 ∼3B-parameter on-device model optimized for Apple silicon through architectural

    innovations such as KV-cache sharing and 2-bit quantization-aware training; The context window size: the system model supports up to 4,096 tokens https://arxiv.org/abs/2507.13575 AFM (Apple Foundation Model)
  19. 37

  20. (Recommended) Explore existing Flutter packages or create your own (추천)

    플러터 패키지 탐색하기 또는 직접 개발하기 Build the UI 화면 개발하기 Download the model (locally or from cloud storage) 모델 다운로드 받기 (로컬 또는 클라우드 스토리지) Add inference functionality 추론 기능 넣기 Review the results 결과 검토하기 + + + Steps + + 39
  21. • Model Download Support • Multimodal Support • Function Calling

    • Thinking Mode • Embedding https://pub.dev/packages/flutter_gemma https://github.com/DenisovAV/flutter_gemma/tree/main • Platform Channel, Pigeon flutter_gemma Sasha Denisov(@ShuregDenisov) @DenisovAV 42 STEP 1
  22. flutter_gemma 43 STEP 1 native flutter • Platform Channel ◦

    Event Channel • Pigeon ◦ a code generator tool to make communication between Flutter and the host platform type-safe, easier, and faster. • mediapipe ◦ tasks-genai
  23. cactus flutter • Language Model (LLM) ◦ Streaming Completions ◦

    Function Calling (Experimental) • Embedding • VLM (v0) https://cactuscompute.com/ https://pub.dev/packages/cactus https://github.com/cactus-compute/cactus-flutter • ffi cactus Cactus Compute, Inc. 44 STEP 1
  24. init FlutterGemmaPlugin & modelManager flutter_gemma 49 final gemma = FlutterGemmaPlugin.instance;

    @override void initState() { super.initState(); modelManager = gemma.modelManager; init(); } STEP 3
  25. model download flutter_gemma 50 final spec = MobileModelManager.createInferenceSpec( name: "gemma-3n-E4B-it-litert-lm",

    modelUrl: "https://huggingface.co/google/gemma-3n-E4B-it-litert-lm/resolve/main/gemma-3n-E4B-it-int 4.litertlm?download=true", ); STEP 3
  26. model download flutter_gemma 51 STEP 3 modelManager?.downloadModelWithProgress( spec, token: "<Huggingface

    Token>", ) .listen( (progress) { print('Loading progress: $progress%'); }, onDone: () { print('Model loading complete.'); init(); }, onError: (error) { print('Error loading model: $error'); }, );
  27. createModel flutter_gemma 53 STEP 3 final model = await gemma.createModel(

    modelType: ModelType.gemmaIt, preferredBackend: PreferredBackend.gpu, supportImage: true, maxTokens: 2048, maxNumImages: 1, );
  28. createChat flutter_gemma 54 STEP 3 chat = await model.createChat( modelType:

    ModelType.gemmaIt, supportImage: true, supportsFunctionCalls: true, tools: _tools, isThinking: false, ); Future<InferenceChat> createChat({ double temperature = .8, int randomSeed = 1, int topK = 1, double? topP, int tokenBuffer = 256, String? loraPath, bool? supportImage, List<Tool> tools = const [], bool? supportsFunctionCalls, bool isThinking = false, // Add isThinking parameter ModelType? modelType, // Add modelType parameter })
  29. addQueryChunk flutter_gemma 55 STEP 3 final message = _selectedImageBytes !=

    null ? Message.withImage( text: text.trim(), imageBytes: _selectedImageBytes!, isUser: true, ) : Message.text(text: text.trim(), isUser: true); await chat?.addQueryChunk(message);
  30. addQueryChunk flutter_gemma 56 STEP 3 final message = _selectedImageBytes !=

    null ? Message.withImage( text: text.trim(), imageBytes: _selectedImageBytes!, isUser: true, ) : Message.text(text: text.trim(), isUser: true); await chat?.addQueryChunk(message);
  31. (Sync) generateChatResponse flutter_gemma 57 STEP 3 ModelResponse response = await

    chat.generateChatResponse(); if (response is TextResponse) { print(response.token); }
  32. (Async) generateChatResponseAsync flutter_gemma 58 STEP 3 chat.generateChatResponseAsync().listen((ModelResponse response) { if

    (response is TextResponse) { print(response.token); }, onDone: () { print('Chat stream closed'); }, onError: (error) { print('Chat error: $error'); });
  33. flutter_gemma 59 STEP 3 init init FlutterGemmaPlugin & modelManager model

    download ensureModelReady createModel createChat addQueryChunk generateChatResponse generateChatResponseAsync IF MODEL EXIST False True
  34. cactus 60 STEP 3 final lm = CactusLM(); await lm.downloadModel(downloadProcessCallback:

    (progress, status, isError) {}); await lm.initializeModel(); final resp = await lm.generateCompletion( messages: [ ChatMessage(content: '', role: "system"), ChatMessage(content: '?', role: "user") ], );
  35. cactus 61 STEP 3 final lm = CactusLM(); await lm.downloadModel(downloadProcessCallback:

    (progress, status, isError) {}); await lm.initializeModel(); final resp = await lm.generateCompletion( messages: [ ChatMessage(content: '', role: "system"), ChatMessage(content: '?', role: "user") ], );
  36. cactus 62 STEP 3 final lm = CactusLM(); await lm.downloadModel(downloadProcessCallback:

    (progress, status, isError) {}); await lm.initializeModel(); final resp = await lm.generateCompletion( messages: [ ChatMessage(content: '', role: "system"), ChatMessage(content: '?', role: "user") ], );
  37. cactus 63 STEP 3 final lm = CactusLM(); await lm.downloadModel(downloadProcessCallback:

    (progress, status, isError) {}); await lm.initializeModel(); final resp = await lm.generateCompletion( messages: [ ChatMessage(content: '', role: "system"), ChatMessage(content: '?', role: "user") ], );
  38. cactus 64 STEP 3 final lm = CactusLM(); await lm.downloadModel(downloadProcessCallback:

    (progress, status, isError) {}); await lm.initializeModel(); final resp = await lm.generateCompletion( messages: [ ChatMessage(content: '', role: "system"), ChatMessage(content: '?', role: "user") ], );
  39. cactus 65 STEP 3 final lm = CactusLM(); await lm.downloadModel(downloadProcessCallback:

    (progress, status, isError) {}); await lm.initializeModel(); final resp = await lm.generateCompletion( messages: [ ChatMessage(content: '', role: "system"), ChatMessage(content: '?', role: "user") ], ); generateCompletionStream
  40. • RAG (Retrieval-Augmented Generation) • Text Embedding ◦ Gecko ◦

    EmbeddingGemma ◦ Qwen3 Embedding • Agents (Local) Next 67