Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GPT-4o with iOS

GPT-4o with iOS

「Mobile勉強会 Wantedly × チームラボ × Sansan #14」での発表資料です。
https://wantedly.connpass.com/event/316051/

記事:
[GPT-4oのマルチモーダル入力をiOSから試す](https://zenn.dev/shu223/articles/gpt4o-multimodal)

shu223

May 21, 2024
Tweet

More Decks by shu223

Other Decks in Programming

Transcript

  1. ࣗݾ঺հ • అ मҰ • @shu223 (GitHub, Zenn, Qiita, note,

    Docswell, 𝕏, YouTube, Podcast, etc...) • ॻ੶ʢ঎ۀग़൛4࡭ɺݸਓग़൛ଟ਺ @BOOTHʣ:
  2. 2024.5.13 GPT-4oൃද Hello GPT-4o | OpenAI 1ߦ໨ɿ We’re announcing GPT-4o,

    our new flagship model that can reason across audio, vision, and text in real time. ʮϚϧνϞʔμϧʯ͕؊
  3. GPT-4oͷϞμϦςΟ GPT-4o (“o” for “omni”) is a step towards much

    more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. • ೖྗɿ ςΩετɾԻ੠ɾը૾ɾಈըͷ͋ΒΏΔ૊Έ߹Θͤ • ग़ྗɿ ςΩετɾԻ੠ɾը૾ͷ͋ΒΏΔ૊Έ߹Θͤ
  4. End-to-end With GPT-4o, we trained a single new model end-to-

    end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Ի੠ → [Whisper] → ςΩετ → [GPT] → ςΩετ → [TTS] → Ի੠ ⇓ Ի੠ → [GPT] → Ի੠
  5. Chat Completion APIʹը૾ɾಈըΛ౤͛Δ API Reference - OpenAI API • ը૾ɿ

    "image_url" ʹը૾σʔλ΍URLΛೖΕΔ • ಈըɿ ϑϨʔϜը૾Λෳ਺ೖΕΔ "content": [ { "type": "image_url", "image_url": ... } ]
  6. ʢิ଍ʣOpenAI APIͷVisionػೳʹ͍ͭͯ • GPT-4 Turbo with VisionϞσϧ͕ϦϦʔε͞Εͨ2023೥ 11݄ʹAPI͕௥Ճ • ಉ݄ɺOpenAIͷAPIυΩϡϝϯτʹVisionػೳͷ࢖͍ํ͕௥

    Ճ • ಈըΛݸʑͷϑϨʔϜը૾ʹ෼ׂͯ͠Ϟσϧʹೖྗ͢Δ͜ͱ Ͱಈըͷ಺༰ཧղ͕Ͱ͖Δͱઆ໌͞Ε͍ͯΔ → ͭ·Γ͜ͷ΁Μ͸GPT-4oͷ৽ػೳͰ͸ͳ͍
  7. iOS×GPT-4oͰϦΞϧλΠϜ ಈըཧղ • "detail" ͸ "low" Λࢦఆ • 512x512ʹϦαΠζ&Ϋϩοϓ •

    લϑϨʔϜͷઆ໌ςΩετ΋ಉࠝ • ฒྻԽ͸·ͩ • ʢίʔυ͸ެ։༧ఆʣ
  8. TTS APIʢԻ੠ग़ྗʣ • ετϦʔϛϯάग़ྗΛαϙʔτ͍ͯ͠Δ • υΩϡϝϯτɿ Text to speech -

    OpenAI API • Python࣮૷ • ࢒೦ͳ͕Β MacPaw/OpenAI Ͱ͸ετϦʔϜग़ྗ͸αϙʔτ ͍ͯ͠ͳ͔ͬͨʢιʔεಡΜͰ֬ೝʣ • ʢContributionνϟϯεʂʣ