Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Generative-AI-Power im Web: Progressive Web App...

Generative-AI-Power im Web: Progressive Web Apps smarter machen

Immer mehr Entwickler beabsichtigen, Generative-AI-Funktionen in ihre Anwendungen zu integrieren. Dieser Weg führt bislang praktisch immer in die Cloud – doch das muss nicht unbedingt so sein! Aktuell gibt es unterschiedliche vielversprechende Ansätze, KI-Modelle direkt auf dem Rechner des Anwenders auszuführen: Hugging Face bietet etwa mit Transformers.js die Möglichkeit, Machine-Learning-Modelle direkt im Browser zu nutzen. Die Web Neural Network API (WebNN) des W3C, die sich noch in der Spezifikationsphase befindet, wird solchen Modellen in Zukunft Zugang zur Neural Processing Unit (NPU) des Geräts gewähren: Damit können etwa auch Large Language Models (LLM) oder Stable-Diffusion-Modelle effizient im Browser betrieben werden.

Die Vorteile dieser Ansätze liegen auf der Hand: Lokal ausgeführte KI-Modelle stehen auch offline zur Verfügung, die Nutzerdaten verlassen das Gerät nicht und das alles dank Open-Source-Modellen sogar kostenfrei. Aber natürlich muss das Modell erstmal auf das Gerät des Anwenders übertragen werden, das auch noch ausreichend leistungsfähig sein muss. In dieser Session wird Christian Liebel, Thinktectures Vertreter beim W3C, diese unterschiedlichen Ansätze präsentieren, um auch Ihre Progressive Web App smarter zu machen. Wir werden Anwendungsfälle diskutieren und Vor- und Nachteile der jeweiligen Lösungen aufzeigen. Seien Sie dabei!

Christian Liebel

July 03, 2024
Tweet

More Decks by Christian Liebel

Other Decks in Programming

Transcript

  1. Hello, it’s me. Christian Liebel X: @christianliebel Email: christian.liebel @thinktecture.com

    Angular & PWA Slides: thinktecture.com /christian-liebel Progressive Web Apps smarter machen Generative-AI-Power im Web
  2. Run locally on the user’s system Single-Page Applications Server- Logik

    Web API Push Service Web API DBs HTML, JS, CSS, Assets Webserver Webbrowser SPA Client- Logik View HTML/CSS View HTML/CSS View HTML/CSS HTTPS WebSockets HTTPS HTTPS Progressive Web Apps smarter machen Generative-AI-Power im Web
  3. Make SPAs offline-capable Progressive Web Apps Service Worker Internet Website

    HTML/JS Cache fetch Progressive Web Apps smarter machen Generative-AI-Power im Web
  4. Speech OpenAI Whisper tortoise-tts … Overview Generative AI Images Midjourney

    DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna … Progressive Web Apps smarter machen Generative-AI-Power im Web
  5. Speech OpenAI Whisper tortoise-tts … Overview Generative AI Images Midjourney

    DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna … Progressive Web Apps smarter machen Generative-AI-Power im Web
  6. Drawbacks – Require an active internet connection – Affected by

    network latency and server availability – Data is transferred to the cloud service – Require a subscription à Can we run models locally? Generative AI Cloud Providers Progressive Web Apps smarter machen Generative-AI-Power im Web
  7. Large: Trained on lots of data Language: Process and generate

    text Models: Programs/neural networks Examples: – GPT (ChatGPT, Bing Chat, …) – Gemini, Gemma (Google) – LLaMa (Meta AI) Large Language Models Progressive Web Apps smarter machen Generative-AI-Power im Web
  8. Token A meaningful unit of text (e.g., a word, a

    part of a word, a character). Context Window The maximum amount of tokens the model can process. Parameters/weights Internal variables learned during training, used to make predictions. Large Language Models Progressive Web Apps smarter machen Generative-AI-Power im Web
  9. Prompts serve as the universal interface Unstructured text conveying specific

    semantics Paradigm shift in software architecture Natural language becomes a first-class citizen Caveats Non-determinism and hallucination, prompt injections Large Language Models Progressive Web Apps smarter machen Generative-AI-Power im Web
  10. Use Cases Content consumption – summarization – translation – answering

    questions about some content – categorization – characterizing Content creation – writing assistance – proofreading – grammar correction – rephrasing Large Language Models https://developer.chrome.com/docs/ai/built-in Progressive Web Apps smarter machen Generative-AI-Power im Web
  11. Size Comparison Model:Parameters Size phi3:3b 2.2 GB mistral:7b 4.1 GB

    llama3:8b 4.7 GB gemma2:9b 5.4 GB gemma2:27b 16 GB llama3:70b 40 GB Large Language Models Progressive Web Apps smarter machen Generative-AI-Power im Web
  12. Benchmarks Selection of available models for WebLLM: – LLaMa-3 8B

    Instruct – LLaMa-3 70B Instruct – Mistral 7B Instruct – Gemma 2B IT https://www.theverge.com/2024/4/18/24134103/llama- 3-benchmark-testing-ai-gemma-gemini-mistral Choosing a model Progressive Web Apps smarter machen Generative-AI-Power im Web
  13. Storing model files locally Cache API Internet Website HTML/JS Cache

    with model files Hugging Face Progressive Web Apps smarter machen Generative-AI-Power im Web
  14. WebAssembly (Wasm) Bytecode for the web Compile target for arbitrary

    languages Can be faster than JavaScript WebLLM needs the model and a Wasm library to accelerate model computations Progressive Web Apps smarter machen Generative-AI-Power im Web
  15. WebGPU Grants low-level access to the Graphics Processing Unit (GPU)

    Near native performance for machine learning applications Supported by Chromium-based browsers on Windows and macOS from version 113 Progressive Web Apps smarter machen Generative-AI-Power im Web
  16. Grants web applications access to the Neural Processing Unit (NPU)

    of the system via platform-specific machine learning services (e.g., ML Compute on macOS/iOS, DirectML on Windows, …) Even better performance compared to WebGPU Currently in specification by the WebML Working Group at W3C Implementation in progress for Chromium-based browsers https://webmachinelearning.github.io/webnn-intro/ Outlook: WebNN Progressive Web Apps smarter machen Generative-AI-Power im Web
  17. WebNN: near-native inference performance Source: Intel. Browser: Chrome Canary 118.0.5943.0,

    DUT: Dell/Linux/i7-1260P, single p-core, Workloads: MediaPipe solution models (FP32, batch=1) Progressive Web Apps smarter machen Generative-AI-Power im Web
  18. Caveats – Due to the Same-Origin Policy, models can’t be

    shared across origins (i.e., https://example.org cannot access https://test.example.org). – Downloading LLMs multiple times leads to very high storage consumption. WebLLM Progressive Web Apps smarter machen Generative-AI-Power im Web
  19. Prompt API Operating System Website HTML/JS Browser Internet Apple Intelligence

    Gemini Nano Progressive Web Apps smarter machen Generative-AI-Power im Web
  20. Part of Chrome’s Built-In AI initiative – Exploratory API for

    local experiments and use case determination – Downloads Gemini Nano into Google Chrome – Model can be shared across origins – Uses native APIs directly – Fine-tuning API might follow in the future Prompt API https://developer.chrome.com/docs/ai/built-in Progressive Web Apps smarter machen Generative-AI-Power im Web
  21. Demo: Smart Form Filler Prompt API DEMO Progressive Web Apps

    smarter machen Generative-AI-Power im Web
  22. Alternatives Prompt Engineering Retrieval Augmented Generation Fine-tuning Custom model Prompt

    Engineering Effort Progressive Web Apps smarter machen Generative-AI-Power im Web
  23. Comparison 22,98 33,96 19,08 38,75 564,63 0 100 200 300

    400 500 600 WebLLM (Mistral-7b, M1) WebLLM (Mistral-7b, M3) OpenAI (GPT-4) Azure OpenAI (GPT-4) Groq (Mixtral-8x7b) Tokens/sec Performance WebLLM/Groq: Own tests (23.03.2024), OpenAI/Azure OpenAI: https://mcplusa.com/comparing-performance-of-openai-gpt-4-and-microsoft-azure-gpt-4/ (31.08.2023) Progressive Web Apps smarter machen Generative-AI-Power im Web
  24. Text-to-image model Generates 512x512px images from a prompt Runs on

    “commodity” hardware (with 8 GB VRAM) Open-source Stable Diffusion Prompt: A guinea pig eating a watermelon Progressive Web Apps smarter machen Generative-AI-Power im Web
  25. Specialized version of the Stable Diffusion model for the web

    2 GB in size Subject to usage conditions: https://huggingface.co/runwayml/stable- diffusion-v1-5#uses No npm package this time Currently incompatible with Angular & esbuild due to Wasm imports Web Stable Diffusion Progressive Web Apps smarter machen Generative-AI-Power im Web
  26. Advantages – Data does not leave the browser – High

    availability (offline support) – Low latency – Stability (external API changes) – Low cost Local AI Models Progressive Web Apps smarter machen Generative-AI-Power im Web
  27. Disadvantages – Lower quality than closed-source models – High system

    requirements (RAM, GPU) – Large model size, high initial bandwidth requirements, models cannot be shared across origins – Model initialization and inference are relatively slow – WebGPU and WebNN are currently only supported by Chromium- based browsers on macOS and Windows (WebNN only behind a flag) – Prompt API is only an exploratory API Local AI Models Progressive Web Apps smarter machen Generative-AI-Power im Web
  28. Transformers.js JavaScript library to run Hugging Face transformers in the

    browser Supports most of the models https://xenova.github.io/transformers.js/ Alternatives Progressive Web Apps smarter machen Generative-AI-Power im Web
  29. – Cloud-based models (especially OpenAI/GPT) remain the most potent models

    and are easier to integrate (for now) – Due to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Open-source GenAI models are becoming more compact and efficient – Vendors are beginning to ship AI models with their devices – Devices are becoming more powerful for AI tasks Summary Progressive Web Apps smarter machen Generative-AI-Power im Web