Public Speaker (since 2013) Writer/Blogger for Techzine, .Net Magazine, Marketingfacts.nl, CustomerTalk.nl and Google Cloud Blog @ladysign http://www.leeboonstra.com
• Native Google Assistant features • Invoke actions (Hey Google talk to <my app>) • Customer Terms & Conditions • Special technical requirements Google Assistant facts: PUBLIC Your business action Action from someone else Weather Action Recipes Action Nest Thermostat Alarm Clock You might want to build your own voice AI instead because of technical requirements, overkill, enterprise usage Lee Boonstra | @ladysign
with WebRTC Docker Container Browser plays audio Architecture How to get a microphone audio stream which works across all browsers? How to make sure the audio stream can be handled as an ArrayBuffer in the back-end?
with WebRTC Docker Container Server Node.js Codebase Docker Container Browser plays audio Architecture How to stream from front-end to back-end? How to stream Bidirectional Binary Data? Lee Boonstra | @ladysign
audio to text by applying powerful neural network models in an easy-to-use API. The API recognizes 120 languages and variants to support your global user base. You can enable voice command-and-control, transcribe audio from call centers, and more.
with WebRTC Docker Container Server Node.js Codebase Docker Container Cloud STT Voice to text Browser plays audio Architecture AudioBuffer? Encoding? SampleRate? Number of Channels? How to get text from the HTML5 browser microphone stream? Lee Boonstra | @ladysign
• Formerly known as API.AI ◦ (Sept 2016, acquired by Google) • Powered by Machine Learning: ◦ Natural Language Understanding (NLU) ◦ Intent Matching ◦ Conversation Training • Cross platform • Build faster with the Web UI • Scalable: Separate your conversation from code • Speech / Voice Integration • Multi-lingual bot support (20+ languages) • Direct integration with 15+ channels like Google Assistant, Slack, Twilio, Facebook...
with WebRTC Docker Container Server Node.js Codebase Docker Container Cloud STT Voice to text Dialogflow Intent Matching Browser plays audio Architecture Translate lang to English Translate lang to English Dialogflow to detect intents. Translate text to base language, and translate back. Lee Boonstra | @ladysign
People treat you like a computer. This is where Wavenet Technology comes in. • Voices sound natural and unique • Capture subtleties like pitch, pace, and all the pauses that convey meaning TTS - Making use of DeepMind's WaveNet Technology https://deepmind.com/blog/wavenet-generative-model-raw-audio/ Lee Boonstra | @ladysign
with WebRTC Docker Container Server Node.js Codebase Docker Container Cloud TTS Spoken voice Cloud STT Voice to text Dialogflow Intent Matching Browser plays audio Architecture Translate lang to English Translate lang to English How to bring an AudioBuffer to the WebClient? How to make it autoplay in all browsers? Lee Boonstra | @ladysign
with WebRTC Docker Container Server Node.js Codebase Docker Container Cloud TTS Spoken voice Cloud STT Voice to text Dialogflow Intent Matching Browser plays audio Architecture Translate lang to English Translate lang to English Need HTTPS? SSL Certificates? Lee Boonstra | @ladysign
seen solutions online where the microphone directly got streamed to the Dialogflow, without a server part. The REST calls were made directly in the web client with JavaScript. I would consider this as an anti-pattern. You will likely expose your service account / private key in your client-side code. Anyone, who is handy with ChromeDev tools, could steal your key, and make (paid) API calls via your account. It's a better approach to always let a server handle the Google Cloud authentication. This way the service account won't be exposed to the public.