Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Performing chaos in a serverless world - Server...

Performing chaos in a serverless world - ServerlessDays Hamburg October 2 2020

Presented at ServerlessDays Hamburg, October 2nd, 2020.

@gunnargrosch
Serverless Chaos Demo
failure-lambda
failure-azurefunctions
failure-cloudfunctions

Chaos engineering is the practice of hypothesis testing through planned experiments to gain a better understanding of a system’s behavior. The principles of chaos engineering have been around for years, and we have now reached the point where chaos engineering has gone from just being a buzzword and practice used by a few large organizations in very specific fields, to it being put in to use by companies of all sizes and industries.

Planning and performing chaos experiments on traditional infrastructure with virtual machines and microservices using containers has been battle-tested by many large organizations, but serverless functions and managed services present different failure modes and level of abstraction.

In this talk we focus on how to apply the principles of chaos engineering to serverless, both for serverless functions and managed services. This covers how hypothesis can be formed to fit serverless, what the experiments can achieve and how to practically perform them.

With tools for chaos engineering, both commercial and open-source, getting more mature most of them still have focus primarily on virtual machines and containers. We’ll look at what tools are out there to help with chaos experiments for serverless and managed services, but also how you can build your own.

Join as we move from talking about the principles to performing real chaos in a serverless world!

Gunnar Grosch

October 02, 2020
Tweet

More Decks by Gunnar Grosch

Other Decks in Technology

Transcript

  1. @gunnargrosch Abstract The principles of chaos engineering have been battletested

    for years using traditional infrastructure and containerized microservices. But how do they work with serverless functions and managed services?
  2. @gunnargrosch Agenda • What is chaos engineering? • Motivations behind

    chaos engineering • Running chaos experiments • Challenges with serverless • Serverless chaos experiments
  3. @gunnargrosch About me Background in development, operations, and management Organizer

    of user groups and conferences AWS Serverless Hero Father of three
  4. @gunnargrosch What is chaos engineering? “Chaos Engineering is the discipline

    of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production” principlesofchaos.org
  5. @gunnargrosch What is chaos engineering? Chaos engineering is about finding

    the weaknesses in a system and fixing them before they break
  6. @gunnargrosch What is chaos engineering? Chaos engineering is about building

    confidence in your system and in your organization
  7. @gunnargrosch Motivations behind chaos engineering Are your customers getting the

    experience they should? Is downtime or issues costing you money? Are you confident in your monitoring and alerting? Is your organization ready to handle outages? Are you learning from incidents?
  8. @gunnargrosch Motivations behind chaos engineering “Chaos engineering should be done

    regularly” Reliability Pillar AWS Well-Architected Framework
  9. @gunnargrosch Challenges with serverless “Serverless allows you to build and

    run applications and services without thinking about servers” Amazon Web Services (AWS)
  10. @gunnargrosch Challenges with serverless No servers to manage Less heavy

    lifting Lots of services to choose from Per function and service configuration More granular architectures
  11. @gunnargrosch Serverless chaos experiments Inject errors into your code Remove

    downstream services Alter the concurrency of functions Restrict the capacity of tables Client Amazon Simple Storage Service (Amazon S3) Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda Amazon Simple Storage Service (Amazon S3)
  12. @gunnargrosch Serverless chaos experiments Security policy errors CORS configuration errors

    Service configuration errors Function disk space failure Client Amazon Simple Storage Service (Amazon S3) Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda Amazon Simple Storage Service (Amazon S3)
  13. @gunnargrosch Serverless chaos experiments Add latency to your functions Cold

    starts Cloud provider issues Runtime or code issues Integration issues Timeouts Client Amazon Simple Storage Service (Amazon S3) Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda Amazon Simple Storage Service (Amazon S3)
  14. @gunnargrosch Failure-lambda NodeJS NPM package for NodeJS Lambdas https://github.com/gunnargrosch/failure-lambda Configuration

    using Parameter Store Several failure modes Latency Status code Exception Disk space Denylist const failureLambda = require('failure-lambda’) exports.handler = failureLambda(async (event, context) => { ... }) { "isEnabled": false, "failureMode": "latency", "rate": 1, "minLatency": 100, "maxLatency": 400, "exceptionMsg": "Exception message!", "statusCode": 404, "diskSpace": 100, “denylist": [ "s3.*.amazonaws.com", "dynamodb.*.amazonaws.com" ] }
  15. @gunnargrosch Serverless chaos demo Client Amazon S3 Amazon API Gateway

    AWS Lambda Amazon DynamoDB AWS Lambda AWS Lambda
  16. @gunnargrosch Client Amazon S3 Amazon API Gateway AWS Lambda Amazon

    DynamoDB AWS Lambda AWS Lambda Serverless chaos demo What if my function takes an extra 300 ms for each invocation? What if my function returns an error code? What if I can’t get data from DynamoDB? Hypothesis: If we inject failure to functions then my application will use graceful degradation.
  17. @gunnargrosch What’s next? “Chaos engineering should be done regularly, and

    be part of your CI/CD cycle” Reliability Pillar AWS Well-Architected Framework
  18. @gunnargrosch Summary Serverless doesn’t make your application resilient Chaos engineering

    helps us find weaknesses and fix them Chaos engineering is about building confidence Chaos engineering should be done regularly It’s not rocket science; you can do it!
  19. @gunnargrosch Do you want more? Follow @serverlesschaos on Twitter Serverless

    Chaos Demo app: https://demo.serverlesschaos.com Failure-lambda: https://github.com/gunnargrosch/failure-lambda Failure-cloudfunctions: https://github.com/gunnargrosch/failure-cloudfunctions Failure-azurefunctions: https://github.com/gunnargrosch/failure-azurefunctions Chaos-lambda: https://github.com/adhorn/aws-lambda-chaos-injection/ Serverless chaos lab: https://github.com/jpbarto/serverless-chaos-lab YouTube videos and repositories: https://grosch.se