Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Android端末で実現するオンデバイスLLM 2025 続き

Avatar for Masayuki Suda Masayuki Suda
October 14, 2025
2

Android端末で実現するオンデバイスLLM 2025 続き

Avatar for Masayuki Suda

Masayuki Suda

October 14, 2025
Tweet

Transcript

  1. ໨࣍ 01. ΠϯτϩμΫγϣϯ 04. GPUͷରԠͷ࣮৘ 02. Llama.cppͷϞσϧపఈൺֱ 03. MediaPipe ͷϞσϧൺֱ

    ɾࣗݾ঺հ ɾηογϣϯ͓͞Β͍ ɾ࿦จ ɾ·ͱΊ ɾϞσϧൺֱ ɾσϞ ɾϞσϧൺֱ ɾσϞ
  2. Llama.cpp ൺֱ͢ΔϞσϧ ࠓճൺֱ͢ΔϞσϧ • Qwen2.5 3B: 2GB • Llama-3-ELYZA-JP-8B: 4.9GB

    • Phi-3.5 Mini: 2.2GB ɾSuzume-llama-3-8B: 4.9GB • Llama 3.2 1B Instruct Q4_K_M: 891MB • Llama 3.2 3B Q4_K_M: 2.3GB
  3. Llama.cpp Ϟσϧͷൺֱ Qwen2.5 3B Llama-3-ELYZA- JP-8B Phi-3.5 Mini Suzume- llama-3-8B

    Llama 3.2 1B Instruct Llama 3.2 3B ඼࣭ɾλεΫୡ੒౓ × ⚪︎ ▲ ⚪︎ ▲ ▲ ฏۉϝϞϦ࢖༻ྔ 2138MB 4446MB 3266MB 4558MB 1058MB 2406MB 1ඵ͋ͨΓͷτʔΫϯ ੜ੒਺ 4.8 1.1 3.1 1.1 11.1 2.9 όοςϦʔফඅ 0.6% 1.20% 0.7% 3% 0.01% 0.45% License Qwen Research Meta Llama 3 Community MIT Meta Llama 3 Community Llama 3.2 Community Llama 3.2 Community
  4. Llama.cpp Ϟσϧͷൺֱ Qwen2.5 3B Llama-3-ELYZA- JP-8B Phi-3.5 Mini Suzume- llama-3-8B

    Llama 3.2 1B Instruct Llama 3.2 3B ඼࣭ɾλεΫୡ੒౓ × ⚪︎ ▲ ⚪︎ ▲ ▲ ฏۉϝϞϦ࢖༻ྔ 2138MB 4446MB 3266MB 4558MB 1058MB 2406MB 1ඵ͋ͨΓͷτʔΫϯ ੜ੒਺ 4.8 1.1 3.1 1.1 11.1 2.9 όοςϦʔফඅ 0.6% 1.20% 0.7% 3% 0.01% 0.45% License Qwen Research Meta Llama 3 Community MIT Meta Llama 3 Community Llama 3.2 Community Llama 3.2 Community
  5. License ൺֱද ϥΠηϯε໊ ঎༻ར ༻ େن໛ར༻࣌ͷ੍ݶ ओͳٛ຿ɾ੍ݶ Meta Llama 3

    Community Մೳ MAU 7ԯਓ௒ͷ৔߹ɺผ్MetaࣾʹϥΠ ηϯεΛਃ੥͢Δඞཁ͕͋Δɻ ʮBuilt with Meta Llama 3ʯͷදࣔٛ຿ɻଞLLM ͷվળ΁ͷར༻ېࢭɻ Llama 3.2 Community Մೳ MAU 7ԯਓ௒ͷ৔߹ɺผ్MetaࣾʹϥΠ ηϯεΛਃ੥͢Δඞཁ͕͋Δɻ ʮBuilt with Llamaʯͷදࣔٛ຿ɻଞLLMͷվળ ΁ͷར༻ېࢭɻ Qwen Research ෆՄ ঎༻ར༻͸ɺผ్ϥΠηϯεܖ໿͕ඞ ཁɻ ݚڀ໨త΍ඇ঎༻ར༻ͷΈڐՄɻมߋՕॴΛ໌ ࣔ͠ɺஶ࡞ݖදࣔΛอ࣋͢Δٛ຿ɻ MIT License Մೳ ͳ͠ ͳ͠
  6. MediaPipe Tasks Inference APIɹ֓ཁ • LiteRT (چTensorFlow Lite) ্Ͱಈ࡞ •

    GemmaϑΝϛϦʔ (.taskܗࣜ) ਪ঑ • LLMಛԽͷߴϨϕϧAPI • NNAPI / GPUਪ࿦ରԠ • ϞόΠϧ୺຤޲͚࠷దԽ
  7. MediaPipe Tasks Inference APIɹϞσϧ • Gemma3 270M: 304MBʢ࠷ܰྔʣ • Gemma3

    1B: 554MBʢߴੑೳɾਪ঑ʣ • Gemma3 NANO 2B: 3GBʢ࠷ߴੑೳʣ
  8. MediaPipe Tasks Inference APIɹϞσϧ Gemma3 270M Gemma3 1B Gemma3 NANO

    2B ඼࣭ɾλεΫୡ੒౓ ▲ ˕ ˕ ฏۉϝϞϦ࢖༻ྔ 456M 832M 875M 1ඵ͋ͨΓͷτʔΫϯੜ੒਺ 17.2 14.1 7.3 όοςϦʔফඅ 0.0001% 0.001% 0.07%
  9. GPU͕ඞͣ͠΋༗རͰͳ͍ͱ͍͏ใࠂ • Challenging GPU Dominance (2025) • iPhone 15 ProͰCPU͕GPUΑΓߴ଎

    • ཧ༝: సૹΦʔόʔϔουɺCPU࠷దԽͷޮՌ • Large LLM Benchmarking on Mobile (2024) • SoCൺֱͰGPU͕༏ҐͰͳ͍έʔεଟ਺
  10. GPU୯ಠͷݶքͱϋΠϒϦουઓུ • HeteroLLM (2025) • CPU+GPU+NPUڠௐ࣮ߦΛఏҊ • Parallel CPU-GPU Execution

    (2025) • GPU੍໿ԼͰCPUΛซ༻ͨ͠ํ͕ޮ཰త • Transformer-Lite (2024) • GPU࠷దԽٕज़͕ඞཁͳ͜ͱΛڧௐ
  11. Gemma3 NANOͰGPUൺֱ 1B 1B GPU NANO 2B NANO 2B GPU

    ඼࣭ɾλεΫୡ੒౓ ˕ ⚪︎ ˕ ⚪︎ ฏۉϝϞϦ࢖༻ྔ 832M 1587MB 875MB 4615M 1ඵ͋ͨΓͷτʔΫϯ ੜ੒਺ 14.1 16.1 7.3 8.4 όοςϦʔফඅ 0.001% 1% 0.07% 2%