Generative-AI-Power im Web: Progressive Web Apps smarter machen

Generative-AI-Power im Web Progressive Web Apps smarter machen Christian Liebel
@christianliebel Consultant

Hello, it’s me. Christian Liebel X: @christianliebel Email: christian.liebel @thinktecture.com
Angular & PWA Slides: thinktecture.com /christian-liebel Progressive Web Apps smarter machen Generative-AI-Power im Web

Generative AI everywhere Progressive Web Apps smarter machen Generative-AI-Power im
Web

Run locally on the user’s system Single-Page Applications Server- Logik
Web API Push Service Web API DBs HTML, JS, CSS, Assets Webserver Webbrowser SPA Client- Logik View HTML/CSS View HTML/CSS View HTML/CSS HTTPS WebSockets HTTPS HTTPS Progressive Web Apps smarter machen Generative-AI-Power im Web

Make SPAs offline-capable Progressive Web Apps Service Worker Internet Website
HTML/JS Cache fetch Progressive Web Apps smarter machen Generative-AI-Power im Web

Speech OpenAI Whisper tortoise-tts … Overview Generative AI Images Midjourney
DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna … Progressive Web Apps smarter machen Generative-AI-Power im Web

Examples Generative AI Cloud Providers Progressive Web Apps smarter machen
Generative-AI-Power im Web

Drawbacks – Require an active internet connection – Affected by
network latency and server availability – Data is transferred to the cloud service – Require a subscription à Can we run models locally? Generative AI Cloud Providers Progressive Web Apps smarter machen Generative-AI-Power im Web

Large: Trained on lots of data Language: Process and generate
text Models: Programs/neural networks Examples: – GPT (ChatGPT, Bing Chat, …) – Gemini, Gemma (Google) – LLaMa (Meta AI) Large Language Models Progressive Web Apps smarter machen Generative-AI-Power im Web

Token A meaningful unit of text (e.g., a word, a
part of a word, a character). Context Window The maximum amount of tokens the model can process. Parameters/weights Internal variables learned during training, used to make predictions. Large Language Models Progressive Web Apps smarter machen Generative-AI-Power im Web

Prompts serve as the universal interface Unstructured text conveying specific
semantics Paradigm shift in software architecture Natural language becomes a first-class citizen Caveats Non-determinism and hallucination, prompt injections Large Language Models Progressive Web Apps smarter machen Generative-AI-Power im Web

Use Cases Content consumption – summarization – translation – answering
questions about some content – categorization – characterizing Content creation – writing assistance – proofreading – grammar correction – rephrasing Large Language Models https://developer.chrome.com/docs/ai/built-in Progressive Web Apps smarter machen Generative-AI-Power im Web

Size Comparison Model:Parameters Size phi3:3b 2.2 GB mistral:7b 4.1 GB
llama3:8b 4.7 GB gemma2:9b 5.4 GB gemma2:27b 16 GB llama3:70b 40 GB Large Language Models Progressive Web Apps smarter machen Generative-AI-Power im Web

https://webllm.mlc.ai/ WebLLM DEMO Progressive Web Apps smarter machen Generative-AI-Power im
Web

On NPM WebLLM Progressive Web Apps smarter machen Generative-AI-Power im
Web

Demo WebLLM DEMO Progressive Web Apps smarter machen Generative-AI-Power im
Web

Benchmarks Selection of available models for WebLLM: – LLaMa-3 8B
Instruct – LLaMa-3 70B Instruct – Mistral 7B Instruct – Gemma 2B IT https://www.theverge.com/2024/4/18/24134103/llama- 3-benchmark-testing-ai-gemma-gemini-mistral Choosing a model Progressive Web Apps smarter machen Generative-AI-Power im Web

Storing model files locally Cache API Internet Website HTML/JS Cache
with model files Hugging Face Progressive Web Apps smarter machen Generative-AI-Power im Web

Parameter cache Cache API Progressive Web Apps smarter machen Generative-AI-Power
im Web

WebAssembly (Wasm) Bytecode for the web Compile target for arbitrary
languages Can be faster than JavaScript WebLLM needs the model and a Wasm library to accelerate model computations Progressive Web Apps smarter machen Generative-AI-Power im Web

WebGPU Grants low-level access to the Graphics Processing Unit (GPU)
Near native performance for machine learning applications Supported by Chromium-based browsers on Windows and macOS from version 113 Progressive Web Apps smarter machen Generative-AI-Power im Web

Grants web applications access to the Neural Processing Unit (NPU)
of the system via platform-specific machine learning services (e.g., ML Compute on macOS/iOS, DirectML on Windows, …) Even better performance compared to WebGPU Currently in specification by the WebML Working Group at W3C Implementation in progress for Chromium-based browsers https://webmachinelearning.github.io/webnn-intro/ Outlook: WebNN Progressive Web Apps smarter machen Generative-AI-Power im Web

WebNN: near-native inference performance Source: Intel. Browser: Chrome Canary 118.0.5943.0,
DUT: Dell/Linux/i7-1260P, single p-core, Workloads: MediaPipe solution models (FP32, batch=1) Progressive Web Apps smarter machen Generative-AI-Power im Web

Caveats – Due to the Same-Origin Policy, models can’t be
shared across origins (i.e., https://example.org cannot access https://test.example.org). – Downloading LLMs multiple times leads to very high storage consumption. WebLLM Progressive Web Apps smarter machen Generative-AI-Power im Web

Prompt API Operating System Website HTML/JS Browser Internet Apple Intelligence
Gemini Nano Progressive Web Apps smarter machen Generative-AI-Power im Web

Part of Chrome’s Built-In AI initiative – Exploratory API for
local experiments and use case determination – Downloads Gemini Nano into Google Chrome – Model can be shared across origins – Uses native APIs directly – Fine-tuning API might follow in the future Prompt API https://developer.chrome.com/docs/ai/built-in Progressive Web Apps smarter machen Generative-AI-Power im Web

First Glance Prompt API Progressive Web Apps smarter machen Generative-AI-Power
im Web

Demo: Smart Form Filler Prompt API DEMO Progressive Web Apps
smarter machen Generative-AI-Power im Web

Alternatives Prompt Engineering Retrieval Augmented Generation Fine-tuning Custom model Prompt
Engineering Effort Progressive Web Apps smarter machen Generative-AI-Power im Web

Comparison 22,98 33,96 19,08 38,75 564,63 0 100 200 300
400 500 600 WebLLM (Mistral-7b, M1) WebLLM (Mistral-7b, M3) OpenAI (GPT-4) Azure OpenAI (GPT-4) Groq (Mixtral-8x7b) Tokens/sec Performance WebLLM/Groq: Own tests (23.03.2024), OpenAI/Azure OpenAI: https://mcplusa.com/comparing-performance-of-openai-gpt-4-and-microsoft-azure-gpt-4/ (31.08.2023) Progressive Web Apps smarter machen Generative-AI-Power im Web

Text-to-image model Generates 512x512px images from a prompt Runs on
“commodity” hardware (with 8 GB VRAM) Open-source Stable Diffusion Prompt: A guinea pig eating a watermelon Progressive Web Apps smarter machen Generative-AI-Power im Web

Specialized version of the Stable Diffusion model for the web
2 GB in size Subject to usage conditions: https://huggingface.co/runwayml/stable- diffusion-v1-5#uses No npm package this time Currently incompatible with Angular & esbuild due to Wasm imports Web Stable Diffusion Progressive Web Apps smarter machen Generative-AI-Power im Web

https://websd.mlc.ai/ Web Stable Diffusion DEMO Progressive Web Apps smarter machen
Generative-AI-Power im Web

Advantages – Data does not leave the browser – High
availability (offline support) – Low latency – Stability (external API changes) – Low cost Local AI Models Progressive Web Apps smarter machen Generative-AI-Power im Web

Disadvantages – Lower quality than closed-source models – High system
requirements (RAM, GPU) – Large model size, high initial bandwidth requirements, models cannot be shared across origins – Model initialization and inference are relatively slow – WebGPU and WebNN are currently only supported by Chromium- based browsers on macOS and Windows (WebNN only behind a flag) – Prompt API is only an exploratory API Local AI Models Progressive Web Apps smarter machen Generative-AI-Power im Web

Transformers.js JavaScript library to run Hugging Face transformers in the
browser Supports most of the models https://xenova.github.io/transformers.js/ Alternatives Progressive Web Apps smarter machen Generative-AI-Power im Web

– Cloud-based models (especially OpenAI/GPT) remain the most potent models
and are easier to integrate (for now) – Due to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Open-source GenAI models are becoming more compact and efficient – Vendors are beginning to ship AI models with their devices – Devices are becoming more powerful for AI tasks Summary Progressive Web Apps smarter machen Generative-AI-Power im Web

Thank you for your kind attention! Christian Liebel @christianliebel [email protected]

Generative-AI-Power im Web: Progressive Web App...

Generative-AI-Power im Web: Progressive Web Apps smarter machen

More Decks by Christian Liebel

Other Decks in Programming

Featured

Transcript