Upgrade to Pro — share decks privately, control downloads, hide ads and more …

apidays Paris 2024 - Make Your LLM Infrastruct...

apidays
December 23, 2024

apidays Paris 2024 - Make Your LLM Infrastructure Serverless, Guillaume Blaquiere, Carrefour

Make Your LLM Infrastructure Serverless
Guillaume Blaquiere, Group Data Architect at Carrefour

apidays Paris 2024 - The Future API Stack for Mass Innovation
December 3 - 5, 2024

------

Check out our conferences at https://www.apidays.global/

Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8

Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io

Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/

apidays

December 23, 2024
Tweet

More Decks by apidays

Other Decks in Programming

Transcript

  1. Cloud run service Serverless container platform Zero config deployments gcloud

    run deploy Pay only while your code runs Auto-scaling to support peak traffic spikes
  2. GPUs & LLMs How do they work together? Prompt Vector

    of tokens Vector of token array values Embeddings Array of bytes Matrix “This is a prompt” This is a pro mpt 13545 5645 3515 12 1354 18 1561531 1812 15644 0.5 1.2 0.35 8.9 0.54 1 0.4 0.01 1.6 5.1 1.3 0.57 1.8 0.4 5.6 10 4.4 0.24 5.1 0.25 0.7 0.92 0.3 0.02 0.25 0.68 0.08 1.25 5.91 0.99
  3. No solution’s perfect Pros and Cons No overprovisioning scale to

    0 Cold start First request latency Limitation Regions & max instances Scale with the traffic pay as you use GPUs available only NVidia L4 Easy to use Auto driver installation
  4. Thank you! Carrefour carrefour.fr Article https://medium.com/google-cloud/cloud-run-gpu-make-your-llms-serverless-5188caacc667 Find me on :

    Twitter @gblaquiere Medium @guillaume-blaquiere GitHub guillaumeblaquiere LinkedIn guillaume blaquiere