Upgrade to Pro — share decks privately, control downloads, hide ads and more …

fourkeys基盤を構築した話

uncle
September 12, 2022

 fourkeys基盤を構築した話

uncle

September 12, 2022
Tweet

More Decks by uncle

Other Decks in Technology

Transcript

  1. Agenda • 生産性の可視化 • fourkeys基盤の概要 • デプロイ頻度の計測 • リードタイムの計測 •

    Incident flowの整備 • incidents view • merged_pull_requests view • 変更障害率の計測 • サービス復元時間の計測 • さいごに
  2. そもそもなんで計測するの? 計測するにあたり指標をどうするかを考えfourkeysを採用した 理由は下記 • 定量的に計測できる ◦ 定義を明確にすれば計測できる • 世の中の水準で現状を評価できる ◦

    Google Cloudで実行されているDevOps組織の有効性を評価する にもあるように、それぞれの KeyごとにElite、High、Medium、Lowの4レベルで評価できる • 自動化できる ◦ GitHubやSlack、CircleCI等のAPIを使って自動計測が可能
  3. 計測対象 fourkeysのそれぞれの定義は各社・各Teamで決めてよいとされてるが、Kyashでは DORA(DevOps Research and Assessment)に合わせて下記のように定義してる 指標 概要 デプロイ頻度 組織による正常な本番へのリリース頻度

    変更のリードタイム commitから本番環境稼働までの所要時間 変更障害率 デプロイが原因で本番環境で障害が発生する割合 (%) サービス復元時間 組織が本番環境での障害から回復するのに掛かる時間
  4. SELECT day, IFNULL(ANY_VALUE(med_time_to_change)/60, 0) AS median_time_to_change, # Hours FROM (

    SELECT d.deploy_id, TIMESTAMP_TRUNC(d.time_created, DAY) AS day, PERCENTILE_CONT( # Ignore automated pushes IF(TIMESTAMP_DIFF(d.time_created, c.time_created, MINUTE) > 0, TIMESTAMP_DIFF(d.time_created, c.time_created, MINUTE), NULL), 0.5) # Median OVER (PARTITION BY TIMESTAMP_TRUNC(d.time_created, DAY)) AS med_time_to_change, # Minutes FROM four_keys.deployments d, d.changes LEFT JOIN four_keys.changes c ON changes = c.change_id ) GROUP BY day ORDER BY day; Query
  5. merged_pull_requests View # Merged Pull Request Table SELECT JSON_EXTRACT_SCALAR(metadata, '$.pull_request.html_url')

    AS url, PARSE_TIMESTAMP('%Y-%m-%dT%H:%M:%SZ',JSON_EXTRACT_SCALAR(metadata, '$.pull_request.created_at')) AS time_created, PARSE_TIMESTAMP('%Y-%m-%dT%H:%M:%SZ',JSON_EXTRACT_SCALAR(metadata, '$.pull_request.merged_at')) AS time_merged, JSON_EXTRACT_SCALAR(metadata, '$.pull_request.merge_commit_sha') AS commit_sha, JSON_EXTRACT_SCALAR(metadata, '$.pull_request.base.repo.name') AS repository FROM `four_keys.events_raw` WHERE source = 'github' AND event_type = 'pull_request' AND json_extract_scalar(metadata, '$.pull_request.merged') = 'true' AND json_extract_scalar(metadata, '$.action') = 'closed' GROUP BY 1,2,3,4,5
  6. Query SELECT TIMESTAMP_TRUNC(d.time_created, DAY) as day, IF(COUNT(DISTINCT d.deploy_id) = 0,

    0, SUM(IF(i.incident_id is NULL, 0, 1)) / COUNT(DISTINCT d.deploy_id)) as change_fail_rate, d.repository as metric, FROM four_keys.deployments d, d.changes LEFT JOIN four_keys.merged_pull_requests m ON changes = m.commit_sha and m.repository = d.repository LEFT JOIN four_keys.incidents i ON i.cause_pr_url = m.url GROUP BY day, d.repository ORDER BY day
  7. Query SELECT day, label as metric, daily_med_time_to_restore FROM ( SELECT

    TIMESTAMP_TRUNC(time_created, DAY) AS day, label, #### Median time to resolve PERCENTILE_CONT( TIMESTAMP_DIFF(time_resolved, time_created, HOUR), 0.5) OVER(PARTITION BY TIMESTAMP_TRUNC(time_created, DAY), label ) AS daily_med_time_to_restore, FROM four_keys.incidents, UNNEST(label_names) AS label ) GROUP BY day, label, daily_med_time_to_restore ORDER BY day