11_22_CEE_Singapore_2024_Presentation.pdf

Copyright © Parallel Inc. All Rights Reserved.  • X /
GitHub: @yoheimuta    • Recent Focus: SRE, MySQL, Vitess, Flink     • Career  ◦ Mobile Game : Worked on SNS game launches at GREE   ◦ AdTech: Built the LINE Ads Platform at FreakOut and LINE   Yohei Yoshimuta   Head of Engineering, Parallel  Speaker  

Copyright © Parallel Inc. All Rights Reserved.  Our goal  
Enjoying It with Friends   Enjoying Entertainment Alone   Parallel brings friends together for shared experiences, aiming to be a central hub for all types of entertainment in a world where affordable, segmented content often drives solo consumption.  

Copyright © Parallel Inc. All Rights Reserved.  About Parallel: Your
Hangout Place   Parallel is a hangout app where you can hop on, see close friends already online, join them instantly, and enjoy content together, both in and outside the app.    

Copyright © Parallel Inc. All Rights Reserved.  Achievements   With
MAU in Japan now reaching levels similar to Twitch and Roblox, 70% Gen Z, 3-hour daily call times, over $20 million raised, and all with a team of just 16.   # of Teams 6

Copyright © Parallel Inc. All Rights Reserved.  How Parallel Embraces
AI   Parallel’s key feature is being a space where young people spend hours with friends, now enhanced with AI that joins voice chats to act as a gaming partner, study buddy and more!       Users ..    - hang out with friends in groups     - spend around 180 mins a day on voice calls    - do everything from gaming to studying and watching videos, both on and off the platform     Parallel’s standout features   Our AI ..    - acts as a conversation partner, gaming buddy, study helper, etc     - keeps group conversations on track and offers ideas    - can take on roles like a poker dealer, energize the group, spin as a DJ, or even step in as a study tutor   Parallel with AI  

Copyright © Parallel Inc. All Rights Reserved.  Our Initial Approach
with Realtime API   With the release of the Realtime API, our initial approach was to create a demo specifically centered around gaming as an example.   1. We use a Wake Word to call AI whenever users need.     2. Our AI acts as a chat buddy or game partner until friends arrive.     3. Our AI mediates conversations when in a group.     4. Our AI becomes the game commentator when people playing together.  

Copyright © Parallel Inc. All Rights Reserved.  To enable AI-powered
conversations, we have added:  - Agora Demo Server : Manages real-time communication with the OpenAI Realtime API.  - OpenAI Realtime API   System Overview   The core system is a group call application consisting of the following components:   - Parallel App  - Parallel API Server   - Agora Voice Server (SaaS)   Existing System   AI-Powered Conversations  

Copyright © Parallel Inc. All Rights Reserved.  Key Challenges in
Implementing AI-Driven Group Conversations   While the API supports audio and text processing, achieving the following complex features wasn’t straightforward and required thoughtful customization and innovation to make it all work seamlessly.   1. Real-Time Group Conversation     2. Speaker Differentiation     3. Wake Word Detection     4. Real-Time Game State Synchronization  

Copyright © Parallel Inc. All Rights Reserved.  • Solution  
◦ Merge individual streams into a single mixed stream on the client side.    • Implementation   ◦ Used Agora’s onPlaybackAudioFrame to manage mixed audio streams.  AI facilitating real-time conversations with multiple users is essential for our application. A key challenge is that the Realtime API is not designed to support multiple audio streams from different users simultaneously.   Real-Time Group Conversation with AI  

Copyright © Parallel Inc. All Rights Reserved.  Real-Time Group Conversation
System Flow  

Copyright © Parallel Inc. All Rights Reserved.  Speaker Identification  
• Solution   ◦ Use speaker volume data to distinguish individual speakers    • Implementation   ◦ Leveraged Agora’s onAudioVolumeIndication to monitor volume levels for each participant   Identifying speakers and tailoring responses for each user is essential in group conversations, but the AI model can't distinguish speakers in a single mixed audio stream.    

Copyright © Parallel Inc. All Rights Reserved.  Speaker Identification System
Flow  

Copyright © Parallel Inc. All Rights Reserved.  • Solution  
◦ Integrate wake word detection to activate the AI as needed.    • Implementation   ◦ Checked audio transcriptions for wake or sleep words  ◦ If detected, VAD (Voice Activity Detection) is turned on or off  Wake Word Detection   To enable hands-free interaction, detecting wake word in real-time is crucial. However, the system needs to maintain session continuity while accurately identifying wake word.     📝VAD identifies speech in the audio stream. When off (default is on), the AI only responds when requested by the system client  

Copyright © Parallel Inc. All Rights Reserved.  Wake Word Detection
System Flow   👍The Whisper Engine enables quick prototyping with real-time transcriptions.   🤖But a dedicated model will be needed in production for greater robustness and precision.  

Copyright © Parallel Inc. All Rights Reserved.  Real-Time Game Commentary
  • Solution   ◦ Real-time game status updates are sent from the Parallel server to the AI as user text messages via the API.    • Implementation   ◦ Only the moves (where each player places a piece) are sent to the AI, rather than the entire board state.  In 1v1 games, conversations often dwindle, creating awkward silences. AI commentary helps maintain engagement, but relying on audio input alone is insufficient to fully convey the dynamic game state.   

Copyright © Parallel Inc. All Rights Reserved.  ⚠We learned that
conveying the full board state in text led to misinterpretations.   ✅So we reduced the information to key moves and formatted it for clarity, using prompt engineering techniques.   Real-Time Game Commentary System Flow  

Copyright © Parallel Inc. All Rights Reserved.  • Rapid Prototyping
⚡  ◦ 1 engineer, 2 weeks prototype  ◦ Previously, setting up AI-driven calls required complex infrastructure, and game commentary needed intricate rule-based systems.     • Improved Tone Control 🛠  ◦ AI responses sometimes lack tonal variety, especially in commentary, missing subtle cues like excitement or surprise.   ◦ The recently added voices bring impressive tonal variety, offering natural and expressive delivery 🤯.  ◦ Yet we’re still learning to tailor the prompt effectively as they were only just released.  Learnings and Future Enhancements  

Copyright © Parallel Inc. All Rights Reserved.  Key Takeaways  
🧩Challenge 💡Solution ⚙Implementation 1. Real-Time Group Conversation Merge individual audio streams into a single mixed stream Used Agora’s onPlaybackAudioFrame for real-time audio management 2. Speaker Differentiation Identify active speaker based on volume data Leveraged onAudioVolumeIndication to detect and distinguish speakers 3. Wake Word Detection Enable AI activation via wake words Used Whisper Engine for transcription and VAD for response control 4. Real-Time Game State Synchronization Update AI with individual moves instead of full board Sent each move as a user message, refined through prompt engineering Scan the QR code to view the full presentation  

11_22_CEE_Singapore_2024_Presentation.pdf

11_22_CEE_Singapore_2024_Presentation.pdf

yoheimuta

More Decks by yoheimuta

Other Decks in Technology

Featured

Transcript

Copyright © Parallel Inc. All Rights Reserved.  1

Copyright © Parallel Inc. All Rights Reserved.  • X /

Copyright © Parallel Inc. All Rights Reserved.  Our goal

Copyright © Parallel Inc. All Rights Reserved.  About Parallel: Your

Copyright © Parallel Inc. All Rights Reserved.  Achievements   With

Copyright © Parallel Inc. All Rights Reserved.  How Parallel Embraces

Copyright © Parallel Inc. All Rights Reserved.  Our Initial Approach

Copyright © Parallel Inc. All Rights Reserved.  Demo

Copyright © Parallel Inc. All Rights Reserved.  To enable AI-powered

Copyright © Parallel Inc. All Rights Reserved.  Key Challenges in

Copyright © Parallel Inc. All Rights Reserved.  • Solution

Copyright © Parallel Inc. All Rights Reserved.  Real-Time Group Conversation

Copyright © Parallel Inc. All Rights Reserved.  Speaker Identification

Copyright © Parallel Inc. All Rights Reserved.  Speaker Identification System

Copyright © Parallel Inc. All Rights Reserved.  • Solution

Copyright © Parallel Inc. All Rights Reserved.  Wake Word Detection

Copyright © Parallel Inc. All Rights Reserved.  Real-Time Game Commentary

Copyright © Parallel Inc. All Rights Reserved.  ⚠We learned that

Copyright © Parallel Inc. All Rights Reserved.  • Rapid Prototyping

Copyright © Parallel Inc. All Rights Reserved.  Key Takeaways