Upgrade to Pro — share decks privately, control downloads, hide ads and more …

"LINE Planet" and AI: Conversations with AI

"LINE Planet" and AI: Conversations with AI

AIエージェントとのテキストベースのコミュニケーションは、すでに多くの方にとって身近なものとなっています。これからは、サービス内でAIと音声でやり取りする方法を検討する時期です。本セッションでは、Voice AIエージェントの必要性と、それに備えるべき理由について説明し、実際の事例もご紹介します。

また、AIとの対話においてLINE PlanetのVoIP技術がなぜ必要なのかをサンプルを通じて解説いたします。AIベースのコミュニケーションを成功裏に導入するための実践的なインサイトを得たい方々のご参加をお待ちしております。

More Decks by LINEヤフーTech (LY Corporation Tech)

Other Decks in Technology

Transcript

  1. About Me • Name: Jeong Deokbeom (aka. Tiger) • Role:

    Product Manger • Experience: • 2007~2010: Staff, Online Education Company • 2011~2013: Marketer, Email ASP Company • 2013~2020: Service Manager(Admin, Developers, Partners), SaaS Company • 2020~2022: Partnership Manager, SaaS Company • 2022~ now: Product Manager, LINE Plus • Interests: Collecting uniquely designed Coca-Cola bottles
  2. 1. Introduce LINE Planet 2. Why Voice in AI? 3.

    Voice AI Agent Cases 4. Technical requirements for building a Voice AI Agent 5. Way to Get Improve Response 6. Voice AI Agent for Group Calls Agenda
  3. LINE Planet - VoIP Platform Service Extend service experience –

    easy integration, high-quality, reliable Leveraging LINE’s 14-year experience in network and audio/video technologies, ‘LINE Planet’ offers solutions for group audio/video calls and low-latency interactive live streaming. With LINE Planet, you can develop cost-effective, low-latency, and high-quality interactive audio/video services swiftly. Core Feature ✓ Quality ✓ Scalability ✓ Proven Experience
  4. Accelerated by Infrastructure and Hardware Advancements The Evolution of Communication

    Remote Communication TEXT VOICE VIDEO AI Communication Now
  5. • AI voice agents make health plan shopping easier. •

    24/7 instant response, no waiting. • Collects info and connects to licensed agents. • Doubles customer interest vs. humans. • 74% are happy to use AI for faster help. Voice AI Agent Cases Customer Support Agent Company: eHealth
  6. • Personalized AI tutor avatars are being introduced for students

    worldwide. • Israel is piloting this with 10,000 students in May, with Harvard as academic adviser. • The AI tutors adapt to each student’s needs and support learning with fast, interactive responses. Voice AI Agent Cases AI Tutor Agent Company: eSelf
  7. • Start real-time voice calls with AI characters instantly on

    app or web. • Low latency for fast, natural conversations. • Wide variety of voices, accents, and personalities; supports multiple languages. • Easily interrupt or switch between voice and text anytime. Voice AI Agent Cases Entertainment Company: Character.ai
  8. • AI-powered NPCs now enable natural, unscripted conversations with players.

    • Context-aware, adaptive NPCs are being tested in immersive AR/VR environments. • Multiplayer frameworks allow players to join or interrupt group chats with multiple AI NPCs. • Companies like ‘Inworld’ and ‘Convai’ create NPCs with unique personalities that react to player actions and storylines. Voice AI Agent Cases Game AI NPC
  9. Voice AI Agent Chain Model Processing Pipeline User Mic AI

    Speaker Any STT Any LLM Any TTS Network
  10. Voice AI Agent Chain Model Processing Pipeline User Mic AI

    Speaker Any STT Any LLM Any TTS Network Pre-process Noise suppression Acoustic Echo canceller
  11. Noise suppression Keep conversations clear by blocking unwanted background sounds.

    Filters out background noise and environmental sounds using AI- powered noise cancellation and spectral subtraction. This ensures that only the speaker’s voice is captured, even in noisy environments, improving the clarity and accuracy of voice recognition systems. Acoustic Echo cancellation Eliminate echoes for seamless communication. Prevents speaker output from being picked up again by the microphone, which can cause distracting echoes during calls. Utilizes adaptive echo cancellation, acoustic feedback suppression, and real- time echo path estimation to maintain a natural and echo-free audio experience, even in challenging acoustic environments. Core Technology Requirement Ensuring Clean Audio Input
  12. Voice AI Agent Chain Model Processing Pipeline User Mic AI

    Speaker Any STT Any LLM Any TTS Network Latency / Congestion Control Pre-process Noise suppression Acoustic Echo canceller
  13. Latency Control How quickly the system can respond. Minimizing response

    delay is essential for real-time conversation—end-to-end processing must be completed within 100–300ms to maintain a natural, human-like interaction. This requires optimizing the entire pipeline and enabling parallel processing across all system components to prevent noticeable pauses in dialogue. Congestion Control Reliable data transmission even under network load. Maintaining high quality during periods of network congestion is crucial, as excessive data traffic can cause packet loss and degrade user experience. Adaptive bitrate streaming, buffering strategies, server load balancing, and priority queue management are implemented to ensure stable performance and uninterrupted communication, even when many users are connected simultaneously. Core Technology Requirement Network
  14. Voice AI Agent Chain Model Processing Pipeline User Mic AI

    Speaker Pre-process Noise suppression Acoustic Echo canceller VAD Any STT Turn-Detection Any LLM Any TTS Network Latency / Congestion Control
  15. Interruption handling Detects and manages overlapping speech or interruptions during

    conversation. Utilizes Voice Activity Detection (VAD) and advanced turn-taking algorithms to identify when multiple speakers talk at once or interrupt each other. This enables the system to maintain a smooth and natural dialogue flow, supporting barge-in scenarios and ensuring that important information is not missed or lost. Turn Detection Recognizes when a speaker has finished or is pausing during conversation. Accurately detects the end of a user’s utterance or a thoughtful pause to determine the optimal timing for system responses. By analyzing speech patterns and context, the system prevents premature interruptions and allows for more human-like, context-aware interactions in real time. Core Technology Requirement For Natural Conversation
  16. ✓Sentence : 1980年代は大衆的な好みが本格的に台頭した時代だった。 ✓ EN: The 1980s was a time

    when popular tastes were on the rise in earnest. ✓ KR: 1980년대는 대중적인 취향이 본격적으로 대두한 시대였다. ✓STT model : gpt-4o-transcribe ✓Realtime API instruction: 'Please repeat exactly what I say, word for word, without adding any additional comments or modifications.' ✓Noise Suppressor: LINE Planet ✓Procedure: For each SNR level, STT results are obtained and compared for both pre-processed and non-pre-processed data. ✓SNR(Signal-to-Noise Ratio) : SNR is the ratio of signal power to noise power and serves as an indicator of signal clarity. SNR represents the relative strength of the desired signal to background noise in a system; a higher value indicates better signal quality. Experimental Comparison – Noise Suppressor Experimental Conditions
  17. Experimental Comparison – Noise Suppressor Pre-processing vs. No Pre-processing SNR

    STT result Pre-process + STT result -14 dB 大きな声でしゃべった。 1980年代は、最終的な好みが 本格的に再動した時代だった。 -11 dB ペンギンのような形のプロペラ だった。 1980年代は、最終的な好みが 本格的に再登した時代だった。 -8 dB 一匹とか二匹でも入ってくるん だ。 1980年代は最終的な好みが 本格的に再登した時代だった。 -5 dB 鉛筆とか消しゴムが机の上から 落ちてしまいそうだった。 1980年代は最終的な好みが 本格的に台頭した時代だった。 -2 dB 研究者と初入念な議論が終わっ た時点に退場した事態だった。 1980年代は最終的な好みが 本格的に台頭した時代だった。 1 dB 1980年代は最終的な好みが 本格的に再導入した時代だった。 1980年代は大衆的な好みが 本格的に台頭した時代だった。 4 dB 1980年代は最終的な好みが 本格的に台頭した時代だった。 1980年代は、大衆的な好みが 本格的に台頭した時代だった。
  18. Experimental Comparison – Noise Suppressor Pre-processing vs. No Pre-processing (Realtime

    API) SNR Realtime API result Pre-process + Realtime API Result -14 dB リンゴが木から落ちた。 大正時代は、待機的な車が 計画的に改造した時代だった。 -11 dB 人はパンのみに生きるにあらず、 神の口から出る一つ一つの言葉 による。 1980年代は、大衆的な好みが 本格的に再度した時代だった。 -8 dB 一日だけでいいので、タイムト ラベルの能力が欲しいです。 1980年代は、最終的な好みが 本格的に再度登場した時代だっ た。 -5 dB 私は歩きながら音楽を聞くのが 好きです。音楽を聴くとリラッ クスできますし、疲れも取れま す。 1980年代は、最終的な好みが 本格的に台頭した時代だった。 -2 dB 今日はとても暑いですね。水を たくさん飲んでください。 1980年代は、最終的な好みが 本格的に台頭した時代だった。 1 dB 研究者の八代さんは、大学を 卒業した後、フランスに留学し て、フランス文学を専門に勉強 していました。 1980年代は大衆的な好みが本格的 に台頭した時代だった。 4 dB 1998年代は、最終的な好みが本 格的に台頭した時代だった。 1980年代は、大衆的な好みが 本格的に台頭した時代だった。
  19. Acoustic Echo cancellation If you don’t have AEC Not only

    AI, but also other people can communicate. AI Agent User STT TTS Hello Text: Hello User3 User2 …
  20. Voice AI Agents in Group Calls LINE Planet Cloud User3

    User2 User1 Audio Steam Send / Receive AI User Participants can use the agent for real-time information search and question answering during the meeting. Planetkit Planetkit Planetkit Please facilitate a retrospective meeting for this project. Can you see if you can join the meeting now by ‘username’ and ask them to let you in?
  21. LINE Planet Features ⎯ 1 to 1 Call (Audio, Video)

    ⎯ Group Call (Audio, Video) ⎯ 10k participants in Group call ⎯ Virtual Background ⎯ Sub Group(=Breakout Room) ⎯ Media Subscription ⎯ Screenshare ⎯ Data Session ⎯ Recording ⎯ Multi platform Support (iOS, Android, Windows, macOS, Web, Flutter) ⎯ Quality (AI Noise Suppressor, AI Acoustic Echo Cancellation) ⎯ Network (Adaptive bitrate control, latency control)