Presented at https://www.meetup.com/ai-and-dl-for-enterprise/events/299910519/ to show llama.cpp running quantised models locally, no internet connection (and no data exfiltration) to summarise text to keywords (llama2 7b), solve physics (phi-2), augment and fact-extract from images (llava), summarise code behaviour (codellama 34b), dig into the Python API to extract embeddings and see next-token generation and probability assignments to reveal the underlying training data structure. Includes a discussion on how quantisation works.
By: https://ianozsvald.com/ and it'll be written up at https://notanumber.email/