Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Operations Concepts & Tools

Operations Concepts & Tools

An intro to some important high-level aspects of code running in production given to some NCSU students -- mostly intended as "these are things you may encounter and may wish to look up" rather than something in-depth. (Not supremely valuable without the accompanying talk but maybe useful, idk)

Avatar for Michael DeHaan

Michael DeHaan

February 24, 2017
Tweet

More Decks by Michael DeHaan

Other Decks in Programming

Transcript

  1. WHY • Familiarity with how your code is deployed and

    managed in production leads to better code • Better understanding of performance and failure modes • It’s good to be friends with the ops team (inverse: even more true!) • Increasing shifts in shared responsibility, sometimes filed under overloaded umbrella-term “DevOps”.
  2. THINGS TO COVER • Classic vs Microservices Architectures • IaaS

    / Cloud APIs and Tools • Configuration Management / Immutable Systems • Monitoring / Log Collection • Load Balancers / Update Strategies • Backup / Disaster Recovery • Security Policy • Continuous Integration / Continuous Deployment
  3. IN THE BEGINNING • In the beginning (and still a

    lot today), software installs were largely run by systems administrators writing their own custom scripts • These scripts grew unmaintainable over time • Scripts could fail • Much of install processes were not fully automated even if some scripts existed • Upgrades were a frequent cause of widespread system failure
  4. IAAS / CLOUD • Misleading assumption that Cloud services (ex:

    Amazon, GCE) are primarily about renting IP addresses • ALSO: storage, databases, load balancers, firewalls/security, messaging, etc • Cloud topology control examples: CloudFormation (AWS), Terraform (generic) • Cloud API examples: Boto (AWS Python) • CLI Tools
  5. CONFIGURATION MANAGEMENT • Declarative description of what should be on

    a system • “Idempotence” & the GPS Analogy: F(x) = F(F(x)) • Typically “push” or “pull” based • Designed around Pull: Puppet, Chef • Designed around Push: Ansible
  6. IMMUTABLE SYSTEMS • Alternative strategy to configuration management • New

    images replace old images, rather than upgrading systems in place • Increases reliability and potentially decreases upgrade times • Cannot be as easily applied to stateful servers (databases, etc) • Can slow down development process • Image building: Packer, docker • Image management: EC2, Mesos/Kubernetes
  7. MONITORING • On-site: • Graphite, Ganglia, Nagios, Cacti, Munin •

    Hosted / Off-site: • Newrelic • Alerting vs trending • Application Performance Management (APM): • AppDynamics
  8. LOAD BALANCERS & AUTO SCALING • Typically more than one

    instance of a service is deployed • Routes requests between services • Closely related: auto-scaling groups • Warming up problems and solutions • TV show voting example
  9. BACKUP / DISASTER RECOVERY • You must be able to

    restore everything from backup • Minimize number/types of data sources • If backups are not tested they do not exist • Understanding multi-region and multi-datacenter
  10. HIDDEN MANAGEMENT COMPLEXITY • As you add management software, the

    management software often needs management • Be aware what happens when you lose a shard or key server • Some software upgrades “weird” • Holes in bucket: This software requires zookeeper, which requires etcd, …
  11. SECURITY POLICY • As the number of teams engaged in

    “self-service” type deployments happen… • Security scans increasingly need to happen at build-time • Consistency is mandatory • Code-review checks need to be in-place and not simply rubber-stamps
  12. CONTINUOUS INTEGRATION • Automatically build code when checked-in • Ideally:

    run unit tests as part of build step. • Typically: Jenkins. Also Travis/CI, CircleCI, Teamcity, Bamboo, others. • Dangers of inconsistent build job rules.
  13. CONTINUOUS DEPLOYMENT • Can’t get here overnight - This is

    a spectrum. • First requires full automation of a deploy, and a solid C.I. setup • When C.I. completes at least deploy to stage and run functional tests • Next step: if FTs pass, consider a deploy to prod
  14. ADDITIONAL RESOURCES • Unfortunately, moves fast. • Latest tech, but

    advice of varying quality: • news.ycombinator.com • Reddit.com/r/devops • Reddit.com/r/sysadmin