of building reliable systems cutting-edge studies assume perfect agents robot faults are common => fault-tolerance e.g., mean time between failure of one robot: 125days* *https://fr.autostoresystem.com/benefits/reliable aws.amazon.com path planning where agents may unexpectedly crash at runtime MAPPCF (w/crash faults)
non-crashed agents eventually reach their destination, regardless of crashes (up to f ) & transition rules & maximum number of crashes f defined with failure detector & execution model centralized planning followed by decentralized execution
NFD SEQ + NFD SEQ + AFD synchronous model sequential model named failure detector anonymous FD SYN SEQ NFD AFD strictly stronger SEQ+AFD SYN+AFD solvable instances weakly stronger SYN+AFD SYN+NFD SYN+AFD SYN+NFD or solvable in SYN unsolvable in SEQ
(DCRF) synchronous model + named FD number of maximum crashes f =2 example* *DCRF is applicable to other models 1. find initial paths 2. identify unresolved events 3. compute backup path & update solution 4. back to step-2
success rate #agents fixed #crashes: f =1 SYN SEQ success rate #crashes f fixed #agents: 15 SYN SEQ solving MAPPCF becomes difficult with more {agents, crashes} SEQ is harder than SYN random-32-32-10 32x32 (|V|=922) from [Stern+ SOCS-19] 30sec timeout with named FD synchronous SYN v.s. sequential SEQ
multiple agents that may crash at runtime fault solution multiple paths assuming crashes https://kei18.github.io/mappcf future directions: complete algorithms, optimization, other types of failure detectors