Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond the Cloud: On-premise Orchestration for ...

Beyond the Cloud: On-premise Orchestration for Open-source LLMs

Serving Ollama models, on-premise from any machine in your company’s datacenter or even from the PC you use for playing video games, and orchestrating inference, as easy as using Redis, official ollama-python/ollama-js packages, and oshepherd, in your web application or your jupyter notebook.

PyCon Austria 2025: https://pycon.pyug.at/talks/beyond-the-cloud-on-premise-orchestration-for-open-source-llms/

Raul Pino

April 06, 2025
Tweet

More Decks by Raul Pino

Other Decks in Technology

Transcript

  1. Orchestrating Ollama from home for teaching AI :) (+ rundown

    on local LLMs) By Raul Pino PyCon Austria 2025
  2. Agenda • Intro • Motivation & Experimentation ◦ Codepeques +

    mnemonica.ai • Third-party API • Custom model deployment options • Enter Local Large Language Models (LLMs) • The idea: Ollama + oshepherd ◦ Architecture ◦ Usage ◦ Demos • When & Where to Use What? • Takeaways & beyond
  3. About me • Born in Venezuela. • +10 years of

    exp as Software Engineer & AI enthusiast (ML Eng recently). • Living in Chile. ◦ Halborn, Distro (YC S24), Elementus, uBiome, Groupon. • <3 AI, Coffee, Scuba Diving, …
  4. Intro • All tech companies jumping into the Large Language

    Models (LLMs)/AI bandwagon! • Specific requirements? Concerns? Use-cases? ◦ Privacy? ▪ Data Exposure Risks ▪ Lack of Transparency ▪ Regulatory and Legal Compliance ◦ Cost $$$? ◦ Customization? ◦ Latency? ◦ Scalability? ◦ Model variety? • OpenAI Data Processing Addendum
  5. Intro Get for your Gen AI experience! Platforms: • https://chatgpt.com/

    • https://platform.openai.com/ Frameworks: • https://python.langchain.com/docs/introduction/
  6. • OpenAI - https://platform.openai.com/ • Anthropic - https://www.anthropic.com/api • Gemini

    - https://gemini.google.com/ • Grok - https://groq.com/ • Cerebras - https://www.cerebras.ai/inference …but problem was access! (US Sanctions => Forbidden in Venezuela) https://help.openai.com/en/articles/5347006-openai-api-supported-countries-and-territories Third-party APIs
  7. • AWS Sagemaker - https://aws.amazon.com/sagemaker/ ◦ ml.g5.xlarge: $1.21 × 24

    × 30 ≈ $870/month • AWS Bedrock - https://aws.amazon.com/bedrock/ ◦ Input: $0.00075 per 1,000 tokens - Output: $0.001 per 1,000 tokens • HuggingFace - https://huggingface.co/pricing#endpoints ◦ $0,50 × 24 × 30 ≈ $360/month …but the problem was cost $$$ Custom Model Deployment Options
  8. • NVIDIA ChatRTX: https://www.nvidia.com/en-us/ai-on-rtx/chatrtx/ • Jellybox: https://jellybox.com/ • Nomic GPT4All

    https://www.nomic.ai/gpt4all • LocalAI https://localai.io/ • WebLLM (WebGPU) https://webllm.mlc.ai/ • Open WebUI https://openwebui.com/ • LM Studio https://lmstudio.ai/ • Ollama https://ollama.com/ Enter Local OpenSource LLMs: Landscape
  9. • NVIDIA ChatRTX: https://www.nvidia.com/en-us/ai-on-rtx/chatrtx/ • Jellybox: https://jellybox.com/ • Nomic GPT4All

    https://www.nomic.ai/gpt4all • LocalAI https://localai.io/ • WebLLM (WebGPU) https://webllm.mlc.ai/ • Open WebUI https://openwebui.com/ • LM Studio https://lmstudio.ai/ • Ollama https://ollama.com/ Enter Local OpenSource LLMs: Issues
  10. 1. Deploy a Redis server. 2. Install ollama and oshepherd

    in your workers. 3. Deploy your oshepherd api server. https://github.com/mnemonica-ai/oshepherd pip install oshepherd oshepherd start-api --env-file .api.env oshepherd start-worker --env-file .worker.env Demos at last! Oshepherd Usage & Demos
  11. • Specific requirements? Concerns? Use-cases? ◦ Privacy? ▪ Data Exposure

    Risks ▪ Lack of Transparency ▪ Regulatory and Legal Compliance ◦ Cost $$$? ◦ Customization? ◦ Latency? ◦ Scalability? ◦ Model variety? ◦ *** ACCESS! When & Where to Use What?
  12. • Support for streams ! • Support for Exo https://github.com/exo-explore/exo

    ! Future work? https://www.youtube.com/watch?v=GBR6pHZ68Ho
  13. Result: We were able to help host classes and the

    Codepeques Day (May 2024) and release a python package!!! https://pypi.org/project/oshepherd/ Takeaways & beyond
  14. Remember: • Embrace AI but not blindly ◦ Check your

    specific requirements! • Contribute to OpenSource • Help your community on the way :) Takeaways & beyond
  15. • Codepeques https://www.instagram.com/codepeques/ • mnemonica.ai - https://mnemonica.ai/ • Oshepherd -

    https://github.com/mnemonica-ai/oshepherd • Redis - https://redis.io/ • Celery - https://docs.celeryq.dev/en/stable/ • Celery Flower - https://flower.readthedocs.io/en/latest/ • NVIDIA ChatRTX: https://www.nvidia.com/en-us/ai-on-rtx/chatrtx/ • Jellybox: https://jellybox.com/ • Nomic GPT4All https://www.nomic.ai/gpt4all • LocalAI https://localai.io/ • WebLLM (WebGPU) https://webllm.mlc.ai/ • Open WebUI https://openwebui.com/ • LM Studio https://lmstudio.ai/ • Ollama https://ollama.com/ • Exo https://github.com/exo-explore/exo • https://python.langchain.com/v0.1/docs/guides/development/local_llms/ • https://generativeai.pub/the-ai-buzz-why-every-company-is-jumping-on-the-bandwagon-07e261707b64 • https://thefr.com/news/as-brands-jump-on-ai-bandwagon-regulators-respond Resources