Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Smartere Web-Apps mit Angular, WebLLM und Promp...

Smartere Web-Apps mit Angular, WebLLM und Prompt API: lokal und offlinefähig

Generative AI ist in aller Munde: Adobe Photoshop erlaubt das Austauschen von Objekten per einfacher Texteingabe und Microsoft Copilot steht Anwendern in Office und Windows zur Seite. Mit WebLLM und WebSD bringen wir generative AI auch in Ihre Angular-App: Lokal und offlinefähig.

Wir generieren Bilder aus Texteingaben und fügen einer Todo-Anwendung einen Chatbot hinzu.

Christian Liebel

March 19, 2025
Tweet

More Decks by Christian Liebel

Other Decks in Programming

Transcript

  1. Hello, it’s me. Smartere Web-Apps mit Angular, WebLLM und Prompt

    API Lokal und offlinefähig Christian Liebel X: @christianliebel Email: christian.liebel @thinktecture.com Angular & PWA Slides: thinktecture.com /christian-liebel
  2. 09:00–10:30 Block 1 10:30–11:00 Coffee Break 11:00–12:30 Block 2 Smartere

    Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Timetable
  3. What to expect Focus on web app development Focus on

    Generative AI Up-to-date insights: the ML/AI field is evolving fast Live demos on real hardware Hands-on labs What not to expect Deep dive into AI specifics, RAG, model finetuning or training Stable libraries or specifications WebSD in Angular 1:1 Support Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Expectations Huge downloads! High requirements! Things may break!
  4. Setup complete? (Node.js, Google Chrome Canary, Editor, Git, macOS/Windows, 20

    GB free disk space, 6 GB VRAM) Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Setup (1/2) LAB #0
  5. git clone https://github.com/thinktecture/angular- days-2025-spring-genai.git cd angular-days-2025-spring-genai npm i npm start

    -- --open Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Setup (2/2) LAB #0
  6. Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und

    offlinefähig Generative AI everywhere Source: https://www.apple.com/chde/apple-intelligence/
  7. Run locally on the user’s system Smartere Web-Apps mit Angular,

    WebLLM und Prompt API Lokal und offlinefähig Single-Page Applications Server- Logik Web API Push Service Web API DBs HTML, JS, CSS, Assets Webserver Webbrowser SPA Client- Logik View HTML/CSS View HTML/CSS View HTML/CSS HTTPS WebSockets HTTPS HTTPS
  8. Make SPAs offline-capable Smartere Web-Apps mit Angular, WebLLM und Prompt

    API Lokal und offlinefähig Progressive Web Apps Service Worker Internet Website HTML/JS Cache fetch
  9. Overview Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal

    und offlinefähig Generative AI Text OpenAI GPT Mistral … Audio/Music Musico Soundraw … Images DALL·E Firefly … Video Sora Runway … Speech Whisper tortoise-tts …
  10. Overview Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal

    und offlinefähig Generative AI Text OpenAI GPT Mistral … Audio/Music Musico Soundraw … Images DALL·E Firefly … Video Sora Runway … Speech Whisper tortoise-tts …
  11. Overview Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal

    und offlinefähig Generative AI Text OpenAI GPT Mistral … Audio/Music Musico Soundraw … Images DALL·E Firefly … Video Sora Runway … Speech Whisper tortoise-tts …
  12. Examples Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal

    und offlinefähig Generative AI Cloud Providers
  13. Drawbacks Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal

    und offlinefähig Generative AI Cloud Providers Require a (stable) internet connection Subject to network latency and server availability Data is transferred to the cloud service Require a subscription
  14. Can we run GenAI models locally? Smartere Web-Apps mit Angular,

    WebLLM und Prompt API Lokal und offlinefähig
  15. Large: Trained on lots of data Language: Process and generate

    text Models: Programs/neural networks Examples: – GPT (ChatGPT, Bing Chat, …) – Gemini, Gemma (Google) – LLaMa (Meta AI) Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Large Language Models
  16. Token A meaningful unit of text (e.g., a word, a

    part of a word, a character). Context Window The maximum amount of tokens the model can process. Parameters/weights Internal variables learned during training, used to make predictions. Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Large Language Models
  17. Prompts serve as the universal interface Unstructured text conveying specific

    semantics Paradigm shift in software architecture Natural language becomes a first-class citizen Caveats Non-determinism and hallucination, prompt injections Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Large Language Models
  18. Size Comparison Model:Parameters Size phi3:3b 2.2 GB mistral:7b 4.1 GB

    llama3:8b 4.7 GB gemma2:9b 5.4 GB gemma2:27b 16 GB llama3:70b 40 GB Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Large Language Models
  19. (1/4) In app.component.ts, add the following lines: protected readonly progress

    = signal(0); protected readonly ready = signal(false); protected engine?: MLCEngine; Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Downloading a model LAB #2
  20. (2/4) In app.component.ts (ngOnInit()), add the following lines: const model

    = 'Llama-3.2-3B-Instruct-q4f32_1-MLC'; this.engine = await CreateMLCEngine(model, { initProgressCallback: ({ progress }) => this.progress.set(progress) }); this.ready.set(true); Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Downloading a model LAB #2
  21. (3/4) In app.component.html, change the following lines: @if(!ready()) { <mat-progress-bar

    mode="determinate" [value]="progress() * 100"></mat-progress-bar> } <button mat-raised-button (click)="runPrompt(prompt.value, langModel.value)" [disabled]="!ready()"> Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Downloading a model LAB #2
  22. (4/4) Launch the app via npm start. The progress bar

    should begin to move. Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Downloading a model LAB #2
  23. Storing model files locally Smartere Web-Apps mit Angular, WebLLM und

    Prompt API Lokal und offlinefähig Cache API Internet Website HTML/JS Cache with model files Hugging Face Note: Due to the Same-Origin Policy, models cannot be shared across origins.
  24. Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und

    offlinefähig WebAssembly (Wasm) – Bytecode for the web – Compile target for arbitrary languages – Can be faster than JavaScript – WebLLM uses a model- specific Wasm library to accelerate model computations
  25. Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und

    offlinefähig WebGPU – Grants low-level access to the Graphics Processing Unit (GPU) – Near native performance for machine learning applications – Supported by Chromium-based browsers on Windows and macOS from version 113
  26. – Grants web apps access to the device’s CPU, GPU

    and Neural Processing Unit (NPU) – In specification by the WebML Working Group at W3C – Implementation in progress in Chromium (behind a flag) – Even better performance compared to WebGPU Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig WebNN Source: https://webmachinelearning.github.io/webnn-intro/ DEMO
  27. Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und

    offlinefähig WebNN: near-native inference performance Source: Intel. Browser: Chrome Canary 118.0.5943.0, DUT: Dell/Linux/i7-1260P, single p-core, Workloads: MediaPipe solution models (FP32, batch=1)
  28. Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und

    offlinefähig WebNN Source: https://github.com/webmachinelearning/webnn/issues/375#issuecomment-2720701672
  29. (1/4) In app.component.ts, add the following lines at the top

    of the class: protected readonly reply = signal(''); Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Model inference LAB #3
  30. (2/4) In the runPrompt() method, add the following code: this.reply.set('…');

    const chunks = languageModel === 'webllm' ? await this.inferWebLLM(userPrompt) : await this.inferPromptApi(userPrompt); for await (const chunk of chunks) { this.reply.set(chunk); } Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Model inference LAB #3
  31. (3/4) In the inferWebLLM() method, add the following code: await

    this.engine!.resetChat(); const messages: ChatCompletionMessageParam[] = [{role: "user", content: userPrompt}]; const chunks = await this.engine!.chat.completions.create({messages, stream: true}); let reply = ''; for await (const chunk of chunks) { reply += chunk.choices[0]?.delta.content ?? ''; yield reply; } Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Model inference LAB #3
  32. (4/4) In app.component.html, change the following line: <pre>{{ reply() }}</pre>

    You should now be able to send prompts to the model and see the responses in the template. Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Model inference LAB #3
  33. 1. In angular.json, increase the bundle size for the Angular

    project (property architect.build.configurations.production.budgets[0] .maximumError) to at least 6MB. 2. Then, run npm run build again. This time, the build should succeed. 3. If you stopped the development server, don’t forget to bring it back up again (npm start). Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Build issues LAB #4
  34. (1/2) In app.component.ts, add the following signal at the top:

    protected readonly todos = signal<Todo[]>([]); Add the following line to the addTodo() method: const text = prompt() ?? ''; this.todos.update(todos => [...todos, { done: false, text }]); Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Todo management LAB #5
  35. (2/2) In app.component.html, add the following lines to add todos

    from the UI: @for (todo of todos(); track $index) { <mat-list-option>{{ todo.text }}</mat-list-option> } Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Todo management LAB #5
  36. @for (todo of todos(); track $index) { <mat-list-option [(selected)]="todo.done"> {{

    todo.text }} </mat-list-option> } ⚠ Boo! This pattern is not recommended. Instead, you should set the changed values on the signal. But this messes up with Angular Material… Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Todo management (extended) LAB #6
  37. Concept and limitations The todo data has to be converted

    into natural language. For the sake of simplicity, we will add all TODOs to the prompt. Remember: LLMs have a context window (Mistral-7B: 8K). If you need to chat with larger sets of text, refer to Retrieval Augmented Generation (RAG). These are the todos: * Wash clothes * Pet the dog * Take out the trash Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Chat with data
  38. System prompt Metaprompt that defines… – character – capabilities/limitations –

    output format – behavior – grounding data Hallucinations and prompt injections cannot be eliminated. You are a helpful assistant. Answer user questions on todos. Generate a valid JSON object. Avoid negative content. These are the user’s todos: … Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Chat with data
  39. Flow System message • The user has these todos: 1.

    … 2. … 3. … User message • How many todos do I have? Assistant message • You have three todos. Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Chat with data
  40. Using a system & user prompt Adjust the code in

    inferWebLLM() to include the system prompt: const systemPrompt = `Here's the user's todo list: ${this.todos().map(todo => `* ${todo.text} (${todo.done ? 'done' : 'not done'})`).join('\n')}`; const messages: ChatCompletionMessageParam[] = [ { role: "system", content: systemPrompt }, { role: "user", content: userPrompt } ]; Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Chat with data LAB #7
  41. Techniques – Providing examples (single shot, few shot, …) –

    Priming outputs – Specify output structure – Repeating instructions – Chain of thought – … Success also depends on the model. Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Prompt Engineering https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/advanced-prompt-engineering
  42. const systemPrompt = `You are a helpful assistant. The user

    will ask questions about their todo list. Briefly answer the questions. Don't try to make up an answer if you don't know it. Here's the user's todo list: ${this.todos().map(todo => `* ${todo.text} (this todo is ${todo.done ? 'done' : 'not done'})`).join('\n')} ${this.todos().length === 0 ? 'The list is empty, there are no todos.' : ''}`; Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Prompt Engineering LAB #8
  43. Alternatives Prompt Engineering Retrieval Augmented Generation Fine-tuning Custom model Smartere

    Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Prompt Engineering Effort
  44. Comparison 45 33 1200 0 200 400 600 800 1000

    1200 1400 WebLLM (Llama3-8b, M4) Azure OpenAI (gpt-4o-mini) Groq (Llama3-8b) Tokens/sec Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Performance WebLLM/Groq: Own tests (14.11.2024), OpenAI/Azure OpenAI: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/provisioned-throughput (18.07.2024)
  45. – Open-source text-to-image model – Generates 512x512px images from a

    prompt – WebSD: special version of Stable Diffusion for the web (2 GB in size) – No npm package this time Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Stable Diffusion Prompt: A guinea pig eating a watermelon
  46. Pros & Cons + Data does not leave the browser

    (privacy) + High availability (offline support) + Low latency + Stability (no external API changes) + Low cost – Lower quality – High system (RAM, GPU) and bandwidth requirements – Large model size, models cannot always be shared – Model initialization and inference are relatively slow – APIs are experimental Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Local AI Models
  47. Mitigations Download model in the background if the user is

    not on a metered connection Helpful APIs: – Network Information API to estimate the network quality/determine data saver (negative standards position by Apple and Mozilla) – Storage Manager API to estimate the available free disk space Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Local AI Models
  48. Mitigations Hybrid modes: – Allow the user to switch between

    cloud/local execution (availability, system requirements) – Deploy OSS model on internal/enterprise infrastructure (privacy) Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Local AI Models
  49. Alternatives: Prompt API Smartere Web-Apps mit Angular, WebLLM und Prompt

    API Lokal und offlinefähig Local AI Models Operating System Website HTML/JS Browser Internet Apple Intelligence Gemini Nano
  50. Alternatives: Prompt API – Exploratory API for local experiments and

    use case determination – Downloads Gemini Nano into Google Chrome – Model is shared across origins – Uses native APIs directly – Related APIs: Translation API, Writing Assistance APIs Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Local AI Models https://developer.chrome.com/docs/ai/built-in
  51. https://www.google.com/chrome/canary/ about://flags Enables optimization guide on device à EnabledBypassPerfRequirement Prompt

    API for Gemini Nano à Enabled about://on-device-internals AI in the Browser Smarter Angular apps with WebGPU and WebNN Local AI Models
  52. Add the following line to the inferPromptApi() method: const systemPrompt

    = ` The user will ask questions about their todo list. Here's the user's todo list: ${this.todos().map(todo => `* ${todo.text} (${todo.done ? 'done' : 'not done'})`).join('\n')}`; const session = await window.ai.languageModel.create({ systemPrompt }); const chunks = session.promptStreaming(userPrompt); let reply = ''; for await (const chunk of chunks) { reply += chunk; yield reply; } Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Local AI Models LAB #9
  53. Alternatives: Ollama – Local runner for AI models – Offers

    a local server a website can connect to à allows sharing models across origins – Supported on macOS and Linux (Windows in Preview) https://webml-demo.vercel.app/ https://ollama.ai/ Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Local AI Models
  54. Alternatives: Hugging Face Transformers Pre-trained, specialized, significantly smaller models beyond

    GenAI Examples: – Text generation – Image classification – Translation – Speech recognition – Image-to-text Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Local AI Models
  55. Alternatives: Transformers.js – Pre-trained, specialized, significantly smaller models beyond GenAI

    – JavaScript library to run Hugging Face transformers in the browser – Supports most of the models https://xenova.github.io/transformers.js/ Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Local AI Models
  56. – Cloud-based models remain the most powerful models – Due

    to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Large language models are becoming more compact and efficient – Vendors start shipping AI models with their devices – Devices are becoming more powerful for running AI tasks – Experiment with the AI APIs and make your Angular App smarter! Smartere Web-Apps mit Angular, WebLLM und Prompt API Lokal und offlinefähig Summary