Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

What Gets Measured Gets Fixed: Observability fo...

Avatar for Jitin Jitin
December 18, 2025

What Gets Measured Gets Fixed: Observability for Android at Scale

Monitoring performance and stability on mobile apps is often limited to crash rates and app launch times. But as apps become critical business drivers, metrics must evolve to capture more complex and granular signals—like page load latency and mean time to complete critical flows.
As you measure deeper, cracks begin to appear in happy paths, helping uncover those elusive P99 problems and hard-to-reproduce edge cases.
This talk will focus on building an observability mindset for Android apps—covering what, when, and how to measure. We'll also explore emerging solutions like OpenTelemetry and show how Android apps can integrate into mature observability backends used by modern infrastructures.

Avatar for Jitin

Jitin

December 18, 2025
Tweet

More Decks by Jitin

Other Decks in Technology

Transcript

  1. Agenda What is observability Issues faced by mobile apps What

    should be measured How to measure Interpreting measurements Fixing issues
  2. Understanding Observability in Software Systems Observability is the ability to

    understand the internal state of a system by examining its outputs. In software, it means having enough signals to diagnose issues without deploying new code or adding instrumentation after the fact. Unlike monitoring, which tells you when something breaks, observability helps you understand why it broke and how to fix it.
  3. Mobile Observability? Crashes The obvious metrics everyone tracks first ANRs

    Application Not Responding events that frustrate users
  4. Crashes kotlin.UninitializedPropertyAccessException: lateinit property listener has not been initialized java.util.ConcurrentModificationException

    at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) java.lang.RuntimeException: java.lang.Throwable: A WebView method was called on thread 'DefaultDispatcher-worker-1'. All WebView methods must be called on the same thread. (Expected Looper Looper (main, tid 1) {730c4d78} called on Looper (null))
  5. Crashes kotlin.UninitializedPropertyAccessException: lateinit property listener has not been initialized var

    listener = null java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) CopyOnWriteArrayList java.lang.RuntimeException: java.lang.Throwable: A WebView method was called on thread 'DefaultDispatcher-worker-1'. All WebView methods must be called on the same thread. (Expected Looper Looper (main, tid 1) {730c4d78} called on Looper (null)) Handler().post { }
  6. ANR

  7. How does your app look like 99.5% crash free rate

    | >4.0 rating 99.5% crash free rate | >4.5 rating 99.8% crash free rate | >4.8 rating 100% crash free rate | 5.0 rating
  8. The Hidden Problems Users Face Slow Page Loads Screens that

    take longer to render content, cause users to tap repeatedly or abandon Silent Errors Failed API calls that show empty states instead of useful content, without any error message Janky Animations Stuttering scrolls and frame drops that make the app feel unpolished Incomplete Flows Users getting stuck mid-journey through critical features like checkout or registration
  9. Measuring What Matters Page Load Measure initial and full load

    times Error Rates Track page failures Transactions Monitor successful completions App Smoothness Observe frame drops and jank
  10. Page Load Latency 1 Fragment/Activity load Latency measurement between onCreate,

    onAttach, onViewCreated. Captures setup latencies 2 Viewmodel Success state Capture latency from loading to success state, involving i/o calls, serialisation etc. 3 UI render latency Success state to UI render, captures latencies around adapter, viewgroups etc. 4 Pagination etc Any latency on subsequent steps.
  11. Page Error Rates Failures to display full page Failure to

    display partial sections Errors due to network fluctuations
  12. The Reproducibility Challenge Reproducing mobile issues is notoriously difficult. The

    root causes are often elusive, hidden in the complex interplay of user sessions and environmental conditions— device specifications, network quality, OS versions, and countless other variables that differ from your development setup.
  13. User Journey left app in background went into lift got

    PN after an hour opened an entirely new page
  14. Network Observability: The Missing Piece Network conditions dramatically impact mobile

    app performance, yet they're often overlooked. Network Type WiFi, 4G, 5G, 3G—each behaves differently Signal Strength Weak signals cause retries and timeouts Network Changes Transitions between networks disrupt connections
  15. What is observability Issues faced by mobile apps What should

    be measured How to measure Interpreting measurements Fixing issues
  16. User Session Timeline To truly understand the user experience and

    debug complex issues, it's crucial to visualize a user's entire session as a timeline of events. This includes not just app interactions, but also underlying system and network conditions. 1 App Start & Initial Load User launches the app, home screen displays. Essential data fetched from api.example.com/init. 2 Network Fluctuation & API Errors User navigates to Product List. Network switches from Wi-Fi to cellular (low signal). Subsequent API call to api.example.com/products times out. 3 High Resource Consumption User scrolls rapidly through images. CPU spikes to 90%, memory usage increases significantly, leading to UI jank. 4 Background Process & Crash A background sync operation starts, consuming more CPU. User taps on an item, triggering a NullPointerException and app crash. This detailed timeline approach helps pinpoint the exact sequence of events that led to a problem, revealing hidden correlations between app behavior, device state, and user actions.
  17. Data-Driven Insights APM Code External Systems Backend Infrastructure App State

    Lifecycle, Background work Network Type, strength & changes Device Manufacturer, OS
  18. Observability Across System Boundaries Mobile App User-facing frontend API Gateway

    Request routing Backend Services Business logic Database Data persistence
  19. Detecting issues faster Why Mobile Observability Matters The mobile app

    is a catch-all for all types of backend issues—it's the user-facing window into your entire system. Additionally mobile apps are themselves becoming more dynamic with feature flags and server driven UI.
  20. What is observability Issues faced by mobile apps What should

    be measured How to measure Interpreting measurements Fixing issues
  21. Standardisation with OpenTelemetry OpenTelemetry provides a vendor-neutral, standardised way to

    measure telemetry— traces, metrics, and logs—that works with multiple backend providers. Benefits Works with multiple vendors Consistent instrumentation Rich ecosystem support Unified SDK across platforms Automatic instrumentation for common libraries
  22. Concepts of Observability Traces Debugging data that's sampled but rich

    with metadata. Shows the journey of a request across services. Distributed tracing Request flow visualization Span-level details Metrics Unsampled, low-size data ideal for alerting. Aggregated numbers that trend over time. Counters and gauges Histograms Real-time dashboards Logs Unsampled text with large context about user sessions. Detailed event records. Structured logging Session context Error details
  23. Instrumentations Auto Screen navigations Network calls Frame drops App launch

    Crash/ANR Manual Screen metrics - Latency, errors Custom tracing
  24. Telemetry Data Handling Trace Metric Time series DB Logs File

    in bucket trigger Manual/Notification In memory collection Disk collection Alerts
  25. What is observability Issues faced by mobile apps What should

    be measured How to measure Interpreting measurements Fixing issues
  26. Observability Debugging Understanding the internal state of a system to

    pinpoint issues. Metrics Traces Logs Monitoring Collecting and analyzing data about the system's performance and health. Metrics
  27. What is observability Issues faced by mobile apps What should

    be measured How to measure Interpreting measurements Fixing issues
  28. Optimise Caching and retries Smart Headers Persistent cache with max-age

    Cache + invalidation with e-tags Smart Retries exponential backoffs circuit breakers
  29. Building an Observability Mindset 01 Central Measurement Framework Create shared

    infrastructure for consistent metric tracking across all screens 02 Dashboard Culture Establish the practice of building and monitoring dashboards for every feature launch 03 Review Rituals Regular team reviews of metrics, anomalies, and trends to surface insights 04 Blameless Postmortems Use incidents as learning opportunities to improve observability and resilience
  30. The Strategic Value of Observability Contribute During Downtimes When backend

    incidents occur, mobile observability helps you contribute meaningfully to debugging and resolution—even if the root cause isn't in your code. Increase Your Visibility Comprehensive observability data improves your team's visibility across engineering leadership and demonstrates impact on business metrics.