Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BWAI 2026 Cloud x Pangyo Gemma 4 핸즈온 - 박제창

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

BWAI 2026 Cloud x Pangyo Gemma 4 핸즈온 - 박제창

2026 BWAI Cloud x Pangyo
Build with AI
2026-05-31 일요일 오후 2시 ~ 4시
박제창

Avatar for JaiChangPark

JaiChangPark

May 31, 2026

More Decks by JaiChangPark

Other Decks in Programming

Transcript

  1. Cloud x Pangyo Agenda Gemma4 에대해 알아보고 핸즈온을 위한 사전

    준비 환경 설정과 오프라인 환경에서 직접 Gemma 4를 실행해봅니다. Gemma 4 Ollama, LM Studio Hands-on: 실습 1 Hands-on: 실습 2 2
  2. Gemma 3 Gemma 3n 모델 크기 270M, 1B, 4B, 12B,

    27B E2B, E4B 입력 270M/1B: Text only 4B/12B/27B: Text + Image Text + Image + Video + Audio 출력 Text only Text only 컨텍스트 270M/1B: 32K 4B/12B/27B: 128K 32K 핵심 구조 Core dense 계열 VLM PLE caching + MatFormer + conditional loading 핵심 메시지 이미지 이해와 긴 컨텍스트를 갖춘 범용 Gemma On-Device 실행 GDG KR X MUG KR 4 * PLE: Per-Layer Embeddings (PLE)
  3. 5 Cloud x Pangyo 항목 E2B E4B 26B A4B 31B

    아키텍처 Dense Dense MoE Dense 파라미터 2.3B effective 5.1B incl. embeddings 4.5B effective 8B incl. embeddings 25.2B total 3.8B active 30.7B 레이어 35 42 30 60 Context 128K 128K 256K 256K Sliding window 512 512 1024 1024 입력 모달리티 Text, Image, Audio Text, Image, Audio Text, Image Text, Image Vision encoder ~150M ~150M ~550M ~550M Audio encoder ~300M ~300M 없음 없음 포지션 E2B/E4B는 mobile, edge devices 대상. 작은 모델은 laptops/mobile의 efficient local execution에 최적화 E2B와 같은 온디바이스 계열, 더 큰 effective params MoE, 25.2B total / 3.8B active. Dense 31B 대비 빠른 추론에 적합 30.7B Dense, consumer GPU/workstation 대상.
  4. 6 Core Capabilities Gemma 4 models handle a broad range

    of tasks across text, vision, and audio. Key capabilities include: • Thinking – Built-in reasoning mode that lets the model think step-by-step before answering. • Long Context – Context windows of up to 128K tokens (E2B/E4B) and 256K tokens (26B A4B/31B). • Image Understanding – Object detection, Document/PDF parsing, screen and UI understanding, chart comprehension, OCR (including multilingual), handwriting recognition, and pointing. Images can be processed at variable aspect ratios and resolutions. • Video Understanding – Analyze video by processing sequences of frames. • Interleaved Multimodal Input – Freely mix text and images in any order within a single prompt. • Function Calling – Native support for structured tool use, enabling agentic workflows. • Coding – Code generation, completion, and correction. • Multilingual – Out-of-the-box support for 35+ languages, pre-trained on 140+ languages. • Audio (E2B and E4B only) – Automatic speech recognition (ASR) and speech-to-translated-text translation across multiple languages Cloud x Pangyo
  5. 9

  6. 10

  7. 모델 (LLM) 입력 출력 Model Context Protocol (MCP) 시스템 프롬프트

    Skills PRD 등 구현할 목표를 설계하기 AGENTS.md DESIGN.md MEMORY.md 등등
  8. 모델 (LLM) 입력 출력 Model Context Protocol (MCP) 시스템 프롬프트

    Skills 구현에 적절한 Skills 찾고 활용하기 많아지면 context, 비용 증가 가능