Making Angular Apps Smarter with Generative AI: Local and Offline-capable

Christian Liebel @christianliebel Consultant Making Angular Apps Smarter with Generative
AI Local and Offline-capable

Hello, it’s me. Making Angular Apps Smarter with Generative AI
Local and Offline-capable Christian Liebel X: @christianliebel Bluesky: @christianliebel.com Email: christian.liebel @thinktecture.com Angular, PWA & Generative AI Slides: thinktecture.com /christian-liebel

Original 09:00–10:30 Block 1 10:30–11:00 Coffee Break 11:00–12:30 Block 2
12:30–13:30 Lunch Break 13:30–15:00 Block 3 15:00–15:30 Coffee Break 15:30–17:00 Block 4 Making Angular Apps Smarter with Generative AI Local and Offline-capable Timetable

What to expect Focus on web app development Focus on
Generative AI Up-to-date insights: the ML/AI field is evolving fast Live demos on real hardware 17 hands-on labs What not to expect Deep dive into AI specifics, RAG, model finetuning or training Stable libraries or specifications Making Angular Apps Smarter with Generative AI Local and Offline-capable Expectations Huge downloads! High requirements! Things may break!

Making Angular Apps Smarter with Generative AI Local and Offline-capable
Workshop Slides

DEMO

(Workshop Edition) Making Angular Apps Smarter with Generative AI Local
and Offline-capable Demo Use Case DEMO

Setup complete? (Node.js, Google Chrome, Editor, Git, macOS/Windows, 20 GB
free disk space, 6 GB VRAM) Making Angular Apps Smarter with Generative AI Local and Offline-capable Setup

webgpureport.org Making Angular Apps Smarter with Generative AI Local and
Offline-capable WebGPU

git clone https://github.com/thinktecture/ijs- munich-2025-genai.git cd ijs-munich-2025-genai npm i Making Angular
Apps Smarter with Generative AI Local and Offline-capable Setup LAB #0

Generative AI everywhere Source: https://www.apple.com/chde/apple-intelligence/

Run locally on the user’s system Making Angular Apps Smarter
with Generative AI Local and Offline-capable Single-Page Applications Server- Logik Web API Push Service Web API DBs HTML, JS, CSS, Assets Webserver Webbrowser SPA Client- Logik View HTML/CSS View HTML/CSS View HTML/CSS HTTPS WebSockets HTTPS HTTPS

Make SPAs offline-capable Making Angular Apps Smarter with Generative AI
Local and Offline-capable Progressive Web Apps Service Worker Internet Website HTML/JS Cache fetch

Overview Making Angular Apps Smarter with Generative AI Local and
Offline-capable Generative AI Text OpenAI GPT Mistral … Audio/Music Musico Soundraw … Images DALL·E Firefly … Video Sora Runway … Speech Whisper tortoise-tts …

Examples Making Angular Apps Smarter with Generative AI Local and
Offline-capable Generative AI Cloud Providers

Drawbacks Making Angular Apps Smarter with Generative AI Local and
Offline-capable Generative AI Cloud Providers Require a (stable) internet connection Subject to network latency and server availability Data is transferred to the cloud service Require a subscription

Can we run GenAI models locally? Making Angular Apps Smarter
with Generative AI Local and Offline-capable

Large: Trained on lots of data Language: Process and generate
text Models: Programs/neural networks Examples: – GPT (ChatGPT, Microsoft Copilot, …) – Gemini, Gemma (Google) – LLaMa (Meta AI) Making Angular Apps Smarter with Generative AI Local and Offline-capable Large Language Models

Token A meaningful unit of text (e.g., a word, a
part of a word, a character). Context Window The maximum amount of tokens the model can process. Parameters/weights Internal variables learned during training, used to make predictions. Making Angular Apps Smarter with Generative AI Local and Offline-capable Large Language Models

Prompts serve as the universal interface Unstructured text conveying specific
semantics Paradigm shift in software architecture Natural language becomes a first-class citizen Caveats Non-determinism and hallucination, prompt injections Making Angular Apps Smarter with Generative AI Local and Offline-capable Large Language Models

Size Comparison Model:Parameters Size phi3:3.8b 2.2 GB mistral:7b 4.1 GB
deepseek-r1:8b 5.2 GB gemma3n:e4b 7.5 GB gemma3:12b 8.1 GB llama4:16x17b 67 GB Making Angular Apps Smarter with Generative AI Local and Offline-capable Large Language Models

https://webllm.mlc.ai/ Making Angular Apps Smarter with Generative AI Local and
Offline-capable WebLLM DEMO

On NPM Making Angular Apps Smarter with Generative AI Local
and Offline-capable WebLLM

npm i @mlc-ai/web-llm npm start -- -o Making Angular Apps
Smarter with Generative AI Local and Offline-capable LAB #1

(1/3) In src/app/todo/todo.ts, add the following lines at the top
of the class: protected readonly progress = signal(0); protected readonly ready = signal(false); protected engine?: MLCEngine; Making Angular Apps Smarter with Generative AI Local and Offline-capable Downloading a model LAB #2

(2/3) In todo.ts (ngOnInit()), add the following lines: this.engine =
await CreateMLCEngine(MODEL, { initProgressCallback: ({ progress }) => this.progress.set(progress) }); this.ready.set(true); Making Angular Apps Smarter with Generative AI Local and Offline-capable Downloading a model LAB #2

(3/3) In todo.html, change the following lines: @if(!ready()) { <mat-progress-bar
mode="determinate" [value]="progress() * 100"></mat-progress-bar> } … <button mat-raised-button (click)="runPrompt(prompt.value, langModel.value)" [disabled]="!ready()"> The progress bar should begin to move. Making Angular Apps Smarter with Generative AI Local and Offline-capable Downloading a model LAB #2

Storing model files locally Making Angular Apps Smarter with Generative
AI Local and Offline-capable Cache API Internet Website HTML/JS Cache with model files Hugging Face Note: Due to the Same-Origin Policy, models cannot be shared across origins.

Parameter cache Making Angular Apps Smarter with Generative AI Local
and Offline-capable Cache API

WebAssembly (Wasm) – Bytecode for the web – Compile target for arbitrary languages – Can be faster than JavaScript – WebLLM uses a model- specific Wasm library to accelerate model computations

WebGPU – Grants low-level access to the Graphics Processing Unit (GPU) – Near native performance for machine learning applications – Supported by Chromium-based browsers on Windows and macOS from version 113, Safari 26, and Firefox 141 on Windows

– Grants web apps access to the device’s CPU, GPU
and Neural Processing Unit (NPU) – In specification by the WebML Working Group at W3C – Implementation in progress in Chromium (behind a flag) – Even better performance compared to WebGPU Making Angular Apps Smarter with Generative AI Local and Offline-capable WebNN Source: https://webmachinelearning.github.io/webnn-intro/ DEMO

WebNN: near-native inference performance Source: Intel. Browser: Chrome Canary 118.0.5943.0, DUT: Dell/Linux/i7-1260P, single p-core, Workloads: MediaPipe solution models (FP32, batch=1)

(1/4) In todo.ts, add the following line at the top
of the class: protected readonly reply = signal(''); Making Angular Apps Smarter with Generative AI Local and Offline-capable Model inference LAB #3

(2/4) In the runPrompt() method, add the following code: this.reply.set('…');
const chunks = languageModel === 'webllm' ? this.inferWebLLM(userPrompt) : this.inferPromptApi(userPrompt); let reply = ''; for await (const chunk of chunks) { reply += chunk; this.reply.set(reply); } Making Angular Apps Smarter with Generative AI Local and Offline-capable Model inference LAB #3

(3/4) In the inferWebLLM() method, add the following code: await
this.engine!.resetChat(); const messages: ChatCompletionMessageParam[] = [{role: "user", content: userPrompt}]; const chunks = await this.engine!.chat.completions.create({messages, stream: true}); for await (const chunk of chunks) { yield chunk.choices[0]?.delta.content ?? ''; } Making Angular Apps Smarter with Generative AI Local and Offline-capable Model inference LAB #3

(4/4) In todo.html, change the following line: <pre>{{ reply() }}</pre>
You should now be able to send prompts to the model and see the responses in the template. ⚠ Note: Browsers support better options for streaming LLM responses: https://developer.chrome.com/docs/ai/render-llm-responses Making Angular Apps Smarter with Generative AI Local and Offline-capable Model inference LAB #3

Stop the development server (Ctrl+C) and run npm run build
Making Angular Apps Smarter with Generative AI Local and Offline-capable LAB #4

1. In angular.json, increase the bundle size for the Angular
project (property architect.build.configurations.production.budgets[0] .maximumError) to 10MB. 2. Then, run npm run build again. This time, the build should succeed. 3. If you stopped the development server, don’t forget to bring it back up again (npm start). Making Angular Apps Smarter with Generative AI Local and Offline-capable Build issues LAB #4

(1/2) In todo.ts, add the following signal at the top:
protected readonly todos = signal<TodoDto[]>([]); Add the following line to the addTodo() method: const text = prompt() ?? ''; this.todos.update(todos => [...todos, { done: false, text }]); Making Angular Apps Smarter with Generative AI Local and Offline-capable Todo management LAB #5

(2/2) In todo.html, add the following lines to add todos
from the UI: @for (todo of todos(); track $index) { <mat-list-option>{{ todo.text }}</mat-list-option> } Making Angular Apps Smarter with Generative AI Local and Offline-capable Todo management LAB #5

@for (todo of todos(); track $index) { <mat-list-option [(selected)]="todo.done"> {{
todo.text }} </mat-list-option> } ⚠ Boo! This pattern is not recommended. Instead, you should set the changed values on the signal. But this messes up with Angular Material… Making Angular Apps Smarter with Generative AI Local and Offline-capable Todo management (extended) LAB #6

Concept and limitations The todo data has to be converted
into natural language. For the sake of simplicity, we will add all TODOs to the prompt. Remember: LLMs have a context window (Mistral-7B: 8K). If you need to chat with larger sets of text, refer to Retrieval Augmented Generation (RAG). These are the todos: * Wash clothes * Pet the dog * Take out the trash Making Angular Apps Smarter with Generative AI Local and Offline-capable Chat with data

System prompt Metaprompt that defines… – character – capabilities/limitations –
output format – behavior – grounding data Hallucinations and prompt injections cannot be eliminated. You are a helpful assistant. Answer user questions on todos. Generate a valid JSON object. Avoid negative content. These are the user’s todos: … Making Angular Apps Smarter with Generative AI Local and Offline-capable Chat with data

Flow System message • The user has these todos: 1.
… 2. … 3. … User message • How many todos do I have? Assistant message • You have three todos. Making Angular Apps Smarter with Generative AI Local and Offline-capable Chat with data

Using a system & user prompt Adjust the code in
inferWebLLM() to include the system prompt: const systemPrompt = `Here's the user's todo list: ${JSON.stringify(this.todos())}`; const messages: ChatCompletionMessageParam[] = [ { role: "system", content: systemPrompt }, { role: "user", content: userPrompt } ]; Making Angular Apps Smarter with Generative AI Local and Offline-capable Chat with data LAB #7

Techniques – Providing examples (single shot, few shot, …) –
Priming outputs – Specify output structure – Repeating instructions – Chain of thought – … Success also depends on the model. Making Angular Apps Smarter with Generative AI Local and Offline-capable Prompt Engineering https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/prompt-engineering

const systemPrompt = `You are a helpful assistant. The user
will ask questions about their todo list. Briefly answer the questions. Don't try to make up an answer if you don't know it. Here's the user's todo list: ${JSON.stringify(this.todos())}`; Making Angular Apps Smarter with Generative AI Local and Offline-capable Prompt Engineering LAB #8

Alternatives Prompt Engineering Retrieval Augmented Generation Fine-tuning Custom model Making
Angular Apps Smarter with Generative AI Local and Offline-capable Prompt Engineering Effort

Adjust todo.ts as follows: const chunks = await this.engine!.chat.completions.create({ messages,
stream: true, stream_options: { include_usage: true } }); for await (const chunk of chunks) { console.log(chunk.usage); yield chunk.choices[0]?.delta.content ?? ''; } Ask a new question and check your console for performance statistics. Making Angular Apps Smarter with Generative AI Local and Offline-capable Performance LAB #9

Workshop Participants Device Tokens/s (Decode) Apple MacBook M4 Max 33.96
DELL AMD Ryzen AI 11.09 DELL 1.7 Lenovo Core i5 0.83 DELL i7 (integrated graphics) 3.66 Apple MacBook M4 Pro 33.63 Making Angular Apps Smarter with Generative AI Local and Offline-capable Performance

Comparison 45 33 1200 0 200 400 600 800 1000
1200 1400 WebLLM (Llama3-8b, M4) Azure OpenAI (gpt-4o-mini) Groq (Llama3-8b) Tokens/sec Making Angular Apps Smarter with Generative AI Local and Offline-capable Performance WebLLM/Groq: Own tests (14.11.2024), OpenAI/Azure OpenAI: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/provisioned-throughput (18.07.2024)

In current version of Chrome or Edge: about://flags Enables optimization
guide on device à EnabledBypassPerfRequirement Prompt API for Gemini Nano à Enabled await LanguageModel.create(); about://components about://on-device-internals Making Angular Apps Smarter with Generative AI Local and Offline-capable Prompt API LAB #10

Prompt API Operating System Website HTML/JS Browser Internet Apple Intelligence Gemini Nano

Part of Chrome’s Built-In AI initiative – Exploratory API for
local experiments and use case determination – Downloads Gemini Nano into Google Chrome – Model can be shared across origins – Uses native APIs directly – Fine-tuning API might follow in the future Making Angular Apps Smarter with Generative AI Local and Offline-capable Prompt API https://developer.chrome.com/docs/ai/built-in

npm i -D @types/dom-chromium-ai add "dom-chromium-ai" to the types in
tsconfig.app.json Making Angular Apps Smarter with Generative AI Local and Offline-capable Prompt API LAB #11

Add the following lines to inferPromptApi(): const systemPrompt = `
The user will ask questions about their todo list. Here's the user's todo list: ${JSON.stringify(this.todos())}`; const languageModel = await LanguageModel.create({ initialPrompts: [{ role: "system", content: systemPrompt }] }); const chunks = languageModel.promptStreaming(userPrompt); for await (const chunk of chunks) { yield chunk; } Making Angular Apps Smarter with Generative AI Local and Offline-capable Local AI Models LAB #12

Alternatives: Ollama – Local runner for AI models – Offers
a local server a website can connect to à allows sharing models across origins – Supported on Windows, Linux, macOS https://ollama.ai/ Making Angular Apps Smarter with Generative AI Local and Offline-capable Local AI Models

https://webml-demo.vercel.app/ https://github.com/jacoblee93/fully-local-pdf-chatbot/ Making Angular Apps Smarter with Generative AI Local
and Offline-capable Local RAG Demo

Alternatives: Hugging Face Transformers Pre-trained, specialized, significantly smaller models beyond
GenAI Examples: – Text generation – Image classification – Translation – Speech recognition – Image-to-text Making Angular Apps Smarter with Generative AI Local and Offline-capable Local AI Models

Alternatives: Transformers.js – Pre-trained, specialized, significantly smaller models beyond GenAI
– JavaScript library to run Hugging Face transformers in the browser https://huggingface.co/docs/transformers.js Making Angular Apps Smarter with Generative AI Local and Offline-capable Local AI Models

DEMO

Just transfer the 17.34 euros to me, my IBAN is
DE02200505501015871393. I am with Hamburger Sparkasse (HASPDEHH). Data Extraction Making Angular Apps Smarter with Generative AI Local and Offline-capable Use Case Nice, here is my address: Peter Müller, Rheinstr. 7, 04435 Schkeuditz

protected readonly formGroup = this.fb.group({ firstName: [''], lastName: [''], addressLine1:
[''], addressLine2: [''], city: [''], state: [''], zip: [''], country: [''], }); Making Angular Apps Smarter with Generative AI Local and Offline-capable Idea Nice, here is my address: Peter Müller, Rheinstr. 7, 04435 Schkeuditz Smart Form Filler (LLM)

Form Field “Insurance numbers always start with INS.” “Try to determine the country based on the input.”

(1/2) Add the following code to form.ts: private fb =
inject(NonNullableFormBuilder); protected formGroup = this.fb.group({ name: '', city: '', }); async fillForm(value: string) {} Making Angular Apps Smarter with Generative AI Local and Offline-capable Form Field LAB #13

(2/2) Add the following code to form.html: <input type="text" #form>
<button (click)="fillForm(form.value)">Fill form</button> <form [formGroup]="formGroup"> <input placeholder="Name" formControlName="name"> <input placeholder="City" formControlName="city"> </form> Making Angular Apps Smarter with Generative AI Local and Offline-capable Form Field LAB #13

Async Clipboard API Allows reading from/writing to the clipboard in
an asynchronous manner Reading from the clipboard requires user consent first (privacy!) Supported by Chrome, Edge and Safari and Firefox Making Angular Apps Smarter with Generative AI Local and Offline-capable Prompt Generator

(1/2) Add the following code to form.ts: async paste() {
const content = await navigator.clipboard.readText(); await this.fillForm(content); } Making Angular Apps Smarter with Generative AI Local and Offline-capable Async Clipboard API LAB #14

(2/2) Add the following code to form.html (after the “Fill
form” button): <button (click)="paste()">Paste</button> Making Angular Apps Smarter with Generative AI Local and Offline-capable Async Clipboard API LAB #14

System message • The form has the following setup: {
"name": "", "city": "" } User message • I am Peter from Berlin Assistant message • { "name": "Peter", "city": "Berlin" } Making Angular Apps Smarter with Generative AI Local and Offline-capable Prompt Generator

Add the following code to the fillForm() method: const languageModel
= await LanguageModel.create({ initialPrompts: [{ role: 'system', content: `Extract the information to a JSON object of this shape: ${JSON.stringify(this.formGroup.value)}`, }], }); const result = await languageModel.prompt(value); console.log(result); Making Angular Apps Smarter with Generative AI Local and Offline-capable Prompt Generator LAB #15

Add the following code to form.ts (fillForm() method): const result
= await languageModel.prompt(value, { responseConstraint: { type: 'object', properties: { name: { type: 'string' }, city: { type: 'string' } } } }); Making Angular Apps Smarter with Generative AI Local and Offline-capable Prompt Generator (Structured Output) LAB #16

Prompt Parser Assistant message • { "name": "Peter", "city": "Berlin" }

Add the following code to form.ts (fillForm() method): this.formGroup.setValue(JSON.parse(result)); Making
Angular Apps Smarter with Generative AI Local and Offline-capable Prompt Parser LAB #17

Assistant message Parsing the assistant message as text/JSON/… JSON Mode
Tool calling Specifying a well-defined interface via a JSON schema called by the LLM (safer, growing support) Structured Output Making Angular Apps Smarter with Generative AI Local and Offline-capable Prompt Parser

Cross-Origin Storage Making Angular Apps Smarter with Generative AI Bonus
Content Local and Offline-capable

Cross-Origin Storage https://chromewebstore.google.com/detail/cross-origin- storage/denpnpcgjgikjpoglpjefakmdcbmlgih https://github.com/web-ai-community/cross-origin-storage-extension Making Angular Apps Smarter with
Generative AI Local and Offline-capable Bonus Content

Pros & Cons + Data does not leave the browser
(privacy) + High availability (offline support) + Low latency + Stability (no external API changes) + Low cost – Lower quality – High system (RAM, GPU) and bandwidth requirements – Large model size, models cannot always be shared – Model initialization and inference are relatively slow – APIs are experimental Making Angular Apps Smarter with Generative AI Local and Offline-capable Summary

– Cloud-based models remain the most powerful models – Due
to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Large language models are becoming more compact and efficient – Vendors are shipping AI models with their devices – Devices are becoming more powerful for running AI workloads – Experiment with the AI APIs and make your Angular App smarter! Making Angular Apps Smarter with Generative AI Local and Offline-capable Summary

Thank you for your kind attention! Christian Liebel @christianliebel [email protected]

Making Angular Apps Smarter with Generative AI:...

Making Angular Apps Smarter with Generative AI: Local and Offline-capable

More Decks by Christian Liebel

Other Decks in Programming

Featured

Transcript