Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Android端末で実現するオンデバイスLLM 2025 続き

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for Masayuki Suda Masayuki Suda
October 14, 2025
27

Android端末で実現するオンデバイスLLM 2025 続き

Avatar for Masayuki Suda

Masayuki Suda

October 14, 2025
Tweet

Transcript

  1. ໨࣍ 01. ΠϯτϩμΫγϣϯ 04. GPUͷରԠͷ࣮৘ 02. Llama.cppͷϞσϧపఈൺֱ 03. MediaPipe ͷϞσϧൺֱ

    ɾࣗݾ঺հ ɾηογϣϯ͓͞Β͍ ɾ࿦จ ɾ·ͱΊ ɾϞσϧൺֱ ɾσϞ ɾϞσϧൺֱ ɾσϞ
  2. Llama.cpp ൺֱ͢ΔϞσϧ ࠓճൺֱ͢ΔϞσϧ • Qwen2.5 3B: 2GB • Llama-3-ELYZA-JP-8B: 4.9GB

    • Phi-3.5 Mini: 2.2GB ɾSuzume-llama-3-8B: 4.9GB • Llama 3.2 1B Instruct Q4_K_M: 891MB • Llama 3.2 3B Q4_K_M: 2.3GB
  3. Llama.cpp Ϟσϧͷൺֱ Qwen2.5 3B Llama-3-ELYZA- JP-8B Phi-3.5 Mini Suzume- llama-3-8B

    Llama 3.2 1B Instruct Llama 3.2 3B ඼࣭ɾλεΫୡ੒౓ × ⚪︎ ▲ ⚪︎ ▲ ▲ ฏۉϝϞϦ࢖༻ྔ 2138MB 4446MB 3266MB 4558MB 1058MB 2406MB 1ඵ͋ͨΓͷτʔΫϯ ੜ੒਺ 4.8 1.1 3.1 1.1 11.1 2.9 όοςϦʔফඅ 0.6% 1.20% 0.7% 3% 0.01% 0.45% License Qwen Research Meta Llama 3 Community MIT Meta Llama 3 Community Llama 3.2 Community Llama 3.2 Community
  4. Llama.cpp Ϟσϧͷൺֱ Qwen2.5 3B Llama-3-ELYZA- JP-8B Phi-3.5 Mini Suzume- llama-3-8B

    Llama 3.2 1B Instruct Llama 3.2 3B ඼࣭ɾλεΫୡ੒౓ × ⚪︎ ▲ ⚪︎ ▲ ▲ ฏۉϝϞϦ࢖༻ྔ 2138MB 4446MB 3266MB 4558MB 1058MB 2406MB 1ඵ͋ͨΓͷτʔΫϯ ੜ੒਺ 4.8 1.1 3.1 1.1 11.1 2.9 όοςϦʔফඅ 0.6% 1.20% 0.7% 3% 0.01% 0.45% License Qwen Research Meta Llama 3 Community MIT Meta Llama 3 Community Llama 3.2 Community Llama 3.2 Community
  5. License ൺֱද ϥΠηϯε໊ ঎༻ར ༻ େن໛ར༻࣌ͷ੍ݶ ओͳٛ຿ɾ੍ݶ Meta Llama 3

    Community Մೳ MAU 7ԯਓ௒ͷ৔߹ɺผ్MetaࣾʹϥΠ ηϯεΛਃ੥͢Δඞཁ͕͋Δɻ ʮBuilt with Meta Llama 3ʯͷදࣔٛ຿ɻଞLLM ͷվળ΁ͷར༻ېࢭɻ Llama 3.2 Community Մೳ MAU 7ԯਓ௒ͷ৔߹ɺผ్MetaࣾʹϥΠ ηϯεΛਃ੥͢Δඞཁ͕͋Δɻ ʮBuilt with Llamaʯͷදࣔٛ຿ɻଞLLMͷվળ ΁ͷར༻ېࢭɻ Qwen Research ෆՄ ঎༻ར༻͸ɺผ్ϥΠηϯεܖ໿͕ඞ ཁɻ ݚڀ໨త΍ඇ঎༻ར༻ͷΈڐՄɻมߋՕॴΛ໌ ࣔ͠ɺஶ࡞ݖදࣔΛอ࣋͢Δٛ຿ɻ MIT License Մೳ ͳ͠ ͳ͠
  6. MediaPipe Tasks Inference APIɹ֓ཁ • LiteRT (چTensorFlow Lite) ্Ͱಈ࡞ •

    GemmaϑΝϛϦʔ (.taskܗࣜ) ਪ঑ • LLMಛԽͷߴϨϕϧAPI • NNAPI / GPUਪ࿦ରԠ • ϞόΠϧ୺຤޲͚࠷దԽ
  7. MediaPipe Tasks Inference APIɹϞσϧ • Gemma3 270M: 304MBʢ࠷ܰྔʣ • Gemma3

    1B: 554MBʢߴੑೳɾਪ঑ʣ • Gemma3 NANO 2B: 3GBʢ࠷ߴੑೳʣ
  8. MediaPipe Tasks Inference APIɹϞσϧ Gemma3 270M Gemma3 1B Gemma3 NANO

    2B ඼࣭ɾλεΫୡ੒౓ ▲ ˕ ˕ ฏۉϝϞϦ࢖༻ྔ 456M 832M 875M 1ඵ͋ͨΓͷτʔΫϯੜ੒਺ 17.2 14.1 7.3 όοςϦʔফඅ 0.0001% 0.001% 0.07%
  9. GPU͕ඞͣ͠΋༗རͰͳ͍ͱ͍͏ใࠂ • Challenging GPU Dominance (2025) • iPhone 15 ProͰCPU͕GPUΑΓߴ଎

    • ཧ༝: సૹΦʔόʔϔουɺCPU࠷దԽͷޮՌ • Large LLM Benchmarking on Mobile (2024) • SoCൺֱͰGPU͕༏ҐͰͳ͍έʔεଟ਺
  10. GPU୯ಠͷݶքͱϋΠϒϦουઓུ • HeteroLLM (2025) • CPU+GPU+NPUڠௐ࣮ߦΛఏҊ • Parallel CPU-GPU Execution

    (2025) • GPU੍໿ԼͰCPUΛซ༻ͨ͠ํ͕ޮ཰త • Transformer-Lite (2024) • GPU࠷దԽٕज़͕ඞཁͳ͜ͱΛڧௐ
  11. Gemma3 NANOͰGPUൺֱ 1B 1B GPU NANO 2B NANO 2B GPU

    ඼࣭ɾλεΫୡ੒౓ ˕ ⚪︎ ˕ ⚪︎ ฏۉϝϞϦ࢖༻ྔ 832M 1587MB 875MB 4615M 1ඵ͋ͨΓͷτʔΫϯ ੜ੒਺ 14.1 16.1 7.3 8.4 όοςϦʔফඅ 0.001% 1% 0.07% 2%