aplica el know-how científico a un componente, producto o proceso para asegurar que desempeña la función para la que fue diseñado sin falla por un tiempo especificado.
However, increasing reliability is worse for a service rather than better! Extreme reliability comes at a cost! Embrace the Risk! No se … me gusta la adrenalina. ACOGIENDO EL RIESGO
If a human operator needs to touch your system during normal operations, you have a bug. In Google at least 50% of each SRE’s time should be spent on engineering project work that will either reduce future toil. ELIMINANDO EL TRABAJO MANUAL
system, such as query counts and types, error counts and types, processing times, and server lifetimes. A Google SRE team with 10–12 members typically has one members whose primary assignment is to build and maintain monitoring systems for their service. MONITOREANDO
• User account creation. • Software or hardware installation preparation and decommissioning. • Rollouts of new software versions. • Runtime configuration changes. Automate the Resilience! AUTOMATIZANDO