Generative AI power on the web: making web apps smarter with WebGPU and WebNN

Generative AI power on the web Making web apps smarter
with WebGPU and WebNN Christian Liebel @christianliebel Consultant

with WebGPU and WebNN Generative AI everywhere

Speech OpenAI Whisper tortoise-tts … Overview Generative AI power on
the web Making web apps smarter with WebGPU and WebNN Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …

Examples Generative AI power on the web Making web apps
smarter with WebGPU and WebNN Generative AI Cloud Providers

Drawbacks – Require an active internet connection – Affected by
network latency and server availability – Data is transferred to the cloud service – Require a subscription à Can we run models locally? Generative AI power on the web Making web apps smarter with WebGPU and WebNN Generative AI Cloud Providers

Size Comparison Model:Parameters Size phi3:3b 2.2 GB mistral:7b 4.1 GB
llama3:8b 4.7 GB gemma2:9b 5.4 GB gemma2:27b 16 GB llama3:70b 40 GB Generative AI power on the web Making web apps smarter with WebGPU and WebNN Large Language Models

https://webllm.mlc.ai/ Generative AI power on the web Making web apps
smarter with WebGPU and WebNN WebLLM DEMO

On NPM Generative AI power on the web Making web
apps smarter with WebGPU and WebNN WebLLM

Storing model files locally Generative AI power on the web
Making web apps smarter with WebGPU and WebNN Cache API Internet Website HTML/JS Cache with model files Hugging Face

Parameter cache Generative AI power on the web Making web
apps smarter with WebGPU and WebNN Cache API

with WebGPU and WebNN WebAssembly (Wasm) Bytecode for the web Compile target for arbitrary languages Can be faster than JavaScript WebLLM needs the model and a Wasm library to accelerate model computations

with WebGPU and WebNN WebGPU Grants low-level access to the Graphics Processing Unit (GPU) Near native performance for machine learning applications Supported by Chromium-based browsers on Windows and macOS from version 113

Grants web applications access to the Neural Processing Unit (NPU)
of the system via platform-specific machine learning services (e.g., ML Compute on macOS/iOS, DirectML on Windows, …) Even better performance compared to WebGPU Currently in specification by the WebML Working Group at W3C Implementation in progress for Chromium-based browsers https://webmachinelearning.github.io/webnn-intro/ Generative AI power on the web Making web apps smarter with WebGPU and WebNN Outlook: WebNN

with WebGPU and WebNN WebNN: near-native inference performance Source: Intel. Browser: Chrome Canary 118.0.5943.0, DUT: Dell/Linux/i7-1260P, single p-core, Workloads: MediaPipe solution models (FP32, batch=1)

Caveats – Due to the Same-Origin Policy, models can’t be
shared across origins (i.e., https://example.org cannot access https://test.example.org). – Downloading LLMs multiple times leads to very high storage consumption. Generative AI power on the web Making web apps smarter with WebGPU and WebNN WebLLM

with WebGPU and WebNN Prompt API Operating System Website HTML/JS Browser Internet Apple Intelligence Gemini Nano

Part of Chrome’s Built-In AI initiative – Exploratory API for
local experiments and use case determination – Downloads Gemini Nano into Google Chrome – Model can be shared across origins – Uses native APIs directly – Fine-tuning API might follow in the future Generative AI power on the web Making web apps smarter with WebGPU and WebNN Prompt API https://developer.chrome.com/docs/ai/built-in

First Glance Generative AI power on the web Making web
apps smarter with WebGPU and WebNN Prompt API

Demo: Smart Form Filler Generative AI power on the web
Making web apps smarter with WebGPU and WebNN Prompt API DEMO

Comparison 22,98 33,96 19,08 38,75 564,63 0 100 200 300
400 500 600 WebLLM (Mistral-7b, M1) WebLLM (Mistral-7b, M3) OpenAI (GPT-4) Azure OpenAI (GPT-4) Groq (Mixtral-8x7b) Tokens/sec Generative AI power on the web Making web apps smarter with WebGPU and WebNN Performance WebLLM/Groq: Own tests (23.03.2024), OpenAI/Azure OpenAI: https://mcplusa.com/comparing-performance-of-openai-gpt-4-and-microsoft-azure-gpt-4/ (31.08.2023)

Text-to-image model Generates 512x512px images from a prompt Runs on
“commodity” hardware (with 8 GB VRAM) Open-source Generative AI power on the web Making web apps smarter with WebGPU and WebNN Stable Diffusion Prompt: A guinea pig eating a watermelon

Specialized version of the Stable Diffusion model for the web
2 GB in size Subject to usage conditions: https://huggingface.co/runwayml/stable- diffusion-v1-5#uses No npm package this time Currently incompatible with Angular & esbuild due to Wasm imports Generative AI power on the web Making web apps smarter with WebGPU and WebNN Web Stable Diffusion

https://websd.mlc.ai/ Generative AI power on the web Making web apps
smarter with WebGPU and WebNN Web Stable Diffusion DEMO

Advantages – Data does not leave the browser – High
availability (offline support) – Low latency – Stability (external API changes) – Low cost Generative AI power on the web Making web apps smarter with WebGPU and WebNN Local AI Models

Disadvantages – Lower quality than closed-source models – High system
requirements (RAM, GPU) – Large model size, high initial bandwidth requirements, models cannot be shared across origins – Model initialization and inference are relatively slow – WebGPU and WebNN are currently only supported by Chromium- based browsers on macOS and Windows (WebNN only behind a flag) – Prompt API is only an exploratory API Generative AI power on the web Making web apps smarter with WebGPU and WebNN Local AI Models

Transformers.js JavaScript library to run Hugging Face transformers in the
browser Supports most of the models https://xenova.github.io/transformers.js/ Generative AI power on the web Making web apps smarter with WebGPU and WebNN Alternatives

– Cloud-based models (especially OpenAI/GPT) remain the most potent models
and are easier to integrate (for now) – Due to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Open-source GenAI models are becoming more compact and efficient – Vendors are beginning to ship AI models with their devices – Devices are becoming more powerful for AI tasks Generative AI power on the web Making web apps smarter with WebGPU and WebNN Summary

Thank you for your kind attention! Christian Liebel @christianliebel [email protected]

Generative AI power on the web: making web apps...

Generative AI power on the web: making web apps smarter with WebGPU and WebNN

Christian Liebel
PRO

More Decks by Christian Liebel

Other Decks in Programming

Featured

Transcript

Generative AI power on the web Making web apps smarter

Generative AI power on the web Making web apps smarter

Speech OpenAI Whisper tortoise-tts … Overview Generative AI power on

Speech OpenAI Whisper tortoise-tts … Overview Generative AI power on

Examples Generative AI power on the web Making web apps

Drawbacks – Require an active internet connection – Affected by

Size Comparison Model:Parameters Size phi3:3b 2.2 GB mistral:7b 4.1 GB

https://webllm.mlc.ai/ Generative AI power on the web Making web apps

On NPM Generative AI power on the web Making web

Storing model files locally Generative AI power on the web

Parameter cache Generative AI power on the web Making web

Generative AI power on the web Making web apps smarter

Generative AI power on the web Making web apps smarter

Grants web applications access to the Neural Processing Unit (NPU)

Generative AI power on the web Making web apps smarter

Caveats – Due to the Same-Origin Policy, models can’t be

Generative AI power on the web Making web apps smarter

Part of Chrome’s Built-In AI initiative – Exploratory API for

First Glance Generative AI power on the web Making web

Demo: Smart Form Filler Generative AI power on the web

Comparison 22,98 33,96 19,08 38,75 564,63 0 100 200 300

Text-to-image model Generates 512x512px images from a prompt Runs on

Specialized version of the Stable Diffusion model for the web

https://websd.mlc.ai/ Generative AI power on the web Making web apps

Advantages – Data does not leave the browser – High

Disadvantages – Lower quality than closed-source models – High system

Transformers.js JavaScript library to run Hugging Face transformers in the

– Cloud-based models (especially OpenAI/GPT) remain the most potent models

Thank you for your kind attention! Christian Liebel @christianliebel [email protected]