need of users willing to untangle the complexity coming from more distributed and independent software components and their interactions. We need to understand: - Interactions & Correlations - Operational deviations - Failure modes - Critical components/paths
different languages and models (client/server, messaging, etc) - Working examples - Right sampling rate Intermediate: - Meaningful and more metadata (tags, logs, etc) - Correlation across observability tools - Dependency graphs - Right sampling strategy Advanced: - Post facto processing and aggregation - Proactive feedback - More configurable sampling
request to be traced - that log field with the malformed data to be included - that deviating metric to be emitted - that alert to be configured in the deviating metric - that thing they need when they need it CERTAINTY
data is not an option for every single company due to scale issues. - High cardinality is expensive and probably useless in many of the cases. - 100% availability is the new 100% coverage. - Transition from a reactive to a proactive model is still WIP
and inspired by Google Dapper (2010). It was open sourced by Twitter (2012). - Mature tracing model emerged from user’s use cases and thousands of hours of support. - Used by large companies like LINE, Netflix, SoundCloud and Yelp but also small ones. - Strong and heterogeneous community
to: - Understand request latency sources - Identify critical path in a request that traverses many components - Get an overview of your services dependencies - Pin point the service at fault when an error occurs
frameworks/libraries (26+ official ones ONLY in Java). - Various exporters to different storages - Comprehensive UI - Knowledge spreading (RATIONALEs, site docs) - Supporting community
- Integration with other observability tools for both server and instrumentation (e.g. loggers and metrics ingestion). - Versatile instrumentation API, embracing interop with other tracing libraries (e.g. OpenTracing, AWS X-ray, Haystack, etc.)
(no sampling, by Yelp) - Secondary Sampling (sampling triggers, by Netflix) - Kafka Storage & Aggregations (post facto sampling, ipso facto aggregations) - VoltDB storage (post facto sampling) - Storage forwarder (multi storage)
Different users have different scales and different needs, either way you need to know their needs. - Data collection is foundational for observability - Analysis and processing of data is becoming more and more important