Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep Dive into Lambda Response Streaming

watany
August 24, 2024

Deep Dive into Lambda Response Streaming

JAWS PANKRATION 2024 でお話しした内容になります。
https://jawspankration2024.jaws-ug.jp/ja/timetable/TT-58/

watany

August 24, 2024
Tweet

More Decks by watany

Other Decks in Technology

Transcript

  1. Introduction • Introduced in 2022, Response Streaming is a key

    feature in AWS Lambda. • It's particularly relevant in the current LLM boom, where streaming large volumes of text output is becoming the norm. • While Lambda supports this capability, comprehensive documentation is still limited. 3
  2. Me Watanabe Yohei(@watany) • NTT TechnoCross Corporation • JAWS-UG Tokyo

    Organizer • Title https://jawsug.connpass.com/event/316451/
  3. Table of Contents 1. What is Lambda Response Streaming? 2.

    Managed Runtime 3. Custom Runtime 4. Lambda Web Adapter (LWA) 5
  4. What is ”Response Streaming” ? Lambda Response Streaming allows immediate

    sending of available response data back to the caller. • Efficiently returns large data, up to 20MB • Reduces time to first byte (TTFB) to just a few milliseconds, minimizing response latency 7 https://aws.amazon.com/jp/blogs/compute/introducing-aws-lambda-resp onse-streaming/
  5. What is ”Response Streaming” ? Key Focus Areas: • LLM

    Chat UI Responses ◦ Examples: Claude, ChatGPT • Server-Side Rendering (SSR) ◦ Examples: Next.js, Remix 8
  6. Response Streaming Limitations • Types of Streaming • Limits of

    Data Size • Buffering • Runtime-specific Limitations 11
  7. Types of ”Response Streaming” Which types of streaming does Lambda

    Response Streaming support? • Transfer-Encoding: chunked • Server-Sent Events (SSE) No • Websocket 12
  8. Limits Payload Size Limits • Buffered (Normal): 6MB • Response

    Streaming: 20MB Bandwidth limits • First 6MB: Uncapped bandwidth for the initial 6MB of your function’s response. • After 6MB: Streaming limited to 2MBps. 13
  9. Buffering • Undocumented Issue: Small response data may not be

    output. • Estimated Flush Threshold: Around 100KB. • Ref: Forcing Lambda Response Streaming to Flush Content ◦ https://betterdev.blog/lambda-response-streaming-flush-c ontent/ 14
  10. Runtime-specific Limitations Each runtime has its own limitations; let’s go

    through them one by one. • Managed Runtime • Custom Runtime • Managed or Container With Lambda Web Adapter 15
  11. Managed Runtime • Available in Node.js v14 and above. •

    Not supported in other languages. • Wrap your function using the decorator implemented in this managed runtime. 17
  12. Write metadata 20 awslambda.HttpResponseStream.from(stream, metadata) Use Case: • custom HTTP

    response status codes • custom HTTP headers • some cookie data to the client.
  13. Custom Runtime Overview • Customizable Environment: Allows you to define

    your own runtime environment instead of using AWS’s managed runtimes. • bootstrap Script: Central to custom runtimes, handling initialization, request processing, and response. • Runtime API Communication: Interacts with the Lambda Runtime API for event handling and response delivery. 22 https://docs.aws.amazon.com/lambda/latest/dg/runtimes-api.html
  14. Custom Runtime Overview After the Execution Environment starts, the Runtime

    and Function communicate requests and responses via the Lambda Runtime API. 23 https://docs.aws.amazon.com/lambda/latest/dg/runtimes-api.html
  15. Custom Runtime Overview Request: The Runtime API GET /runtime/invocation/next fetches

    the event, which the Runtime and Function then process. 24 https://docs.aws.amazon.com/lambda/latest/dg/runtimes-api.html curl -X GET "http://${AWS_LAMBDA_RUNTIME_API}/r untime/invocation/next"
  16. Custom Runtime Overview Response: The Runtime and Function process the

    result, then POST it to the Runtime API at /runtime/invocation/AwsRequestId/response. 25 https://docs.aws.amazon.com/lambda/latest/dg/runtimes-api.html curl -X POST "http://${AWS_LAMBDA_RUNTIME_API}/r untime/invocation/${AwsRequestId}/re sponse"
  17. How to Enable Response Streaming in a Custom Runtime When

    POST to Runtime API /runtime/invocation/AwsRequestId/response, do the following • "Add the following to the headers ◦ Lambda-Runtime-Function-Response-Mode: streaming ◦ Transfer-Encoding: chunked • Send the response in chunks, then close the connection. • and other error handling 26
  18. How to Enable Response Streaming in a Custom Runtime •

    The current implementation covers the equivalent of `awslambda.streamifyResponse`. • To implement the equivalent of `awslambda.HttpResponseStream`.from, you need: ◦ Set Content-Type to `application/vnd.awslambda.http-integration-response`. ◦ Send custom headers (status code, headers, cookies) in JSON. ◦ Add 8 NULL characters as separators. ◦ Encode the response using HTTP/1.1 chunked transfer. • Ref:https://aws.amazon.com/jp/blogs/compute/using-response-streaming -with-aws-lambda-web-adapter-to-optimize-performance/ 27
  19. Implementation Example Refer to the implemented "Rust Runtime for AWS

    Lambda" for guidance. 28 https://github.com/awslabs/aws-lambda-rust-runtime/blob/fbf212f4eef8c0fd8bd87f87998239fa17bc2b23/lambda-runtime/src/streaming.rs
  20. What is Lambda Web Adapter(LWA)? • Run web apps on

    AWS Lambda using familiar frameworks like Express.js, Flask, and SpringBoot. • Deploy the same Docker image across AWS Lambda, EC2, Fargate, and local environments. 30 https://github.com/awslabs/aws-lambda-web-adapter
  21. Why Response Streaming Works with LWA? Lambda Web Adapter includes

    a "Runtime Interface Client." 31 https://github.com/awslabs/aws-lambda-web-adapter
  22. Why Response Streaming Works with LWA? Lambda Web Adapter connects

    the "Web App" and "Runtime API. 32 https://aws.amazon.com/jp/blogs/compute/using-response-streaming-wit h-aws-lambda-web-adapter-to-optimize-performance/ Triggers the app to start upon receiving an HTTP request. Converts events fetched from the Runtime API into HTTP requests.
  23. Why Response Streaming Works with LWA? Lambda Web Adapter connects

    the "Web App" and "Runtime API. 33 https://aws.amazon.com/jp/blogs/compute/using-response-streaming-wit h-aws-lambda-web-adapter-to-optimize-performance/ Returns the HTTP Response to the Lambda Web Adapter (LWA). Converts the HTTP Response and POSTs it to the Runtime API.
  24. Why Response Streaming Works with LWA? Two Common Use Cases

    for Using LWA 1. Container Runtime 2. Managed Runtime 34
  25. Container Runtime + LWA It works with just one additional

    line. (2nd line) Nothing else is special. 37
  26. Managed Runtime + LWA 2. Lambda Environment Variables Must: •

    AWS_LWA_EXEC_WRAPPER: /opt/bootstrap ◦ Required for LWA + Managed runtimes. • AWS_LWA_INVOKE_MODE: RESPONSE_STREAM ◦ Important for streaming. (Functions URL alone is not enough) 41
  27. Managed Runtime + LWA 2. Lambda Environment Variables Optional: •

    https://github.com/awslabs/ aws-lambda-web-adapter 42
  28. Managed Runtime + LWA Appendix. The LWA GitHub repository contains

    examples for various frameworks. https://github.com/awslabs/aws-lambda -web-adapter/tree/main/examples/ 46
  29. Which one should you use? • If you can use

    Node.js, that's the best choice. ◦ Also, consider using Hono. • If you can trigger your app with an HTTP request, LWA is ideal. • Custom runtimes are challenging, but examples are available. 48