Partial Failure A partial failure — a failure in a process P when a fault does not crash P, but causes safety or liveness violation or severe slowness for some functionality
• It’s process level, not node level
• Process is still alive, this is not a fail-stop failure
Findings 1-2 Finding 1: In all the five systems, partial failures appear throughout release history (Table 1). 54% of them occur in the most recent three years’ software releases.
Finding 2: The root causes of studied failures are diverse. The top three (total 48%) root cause types are uncaught errors, inde fi nite blocking, and buggy error handling. 9