Advancement: Fault Detection in Byzantine Fault Tolerant Systems

Tuan Tran
Computer Science and Engineering Ph.D. Candidate
Virtual Event
Peter Alvaro

Join us on Zoom / Passcode: decent

Description: With modern applications becoming increasingly more complex and widely used, there is a strong need for these applications to remain available in the presence of failures. Byzantine fault tolerant state machine replication (BFT) is an approach to building such highly-available systems. Though BFT protocols allow systems to tolerate any kind of faults, they generally do not make much effort in detecting faults that occur. This is because these protocols typically assume a partially synchronous network in which communication between processes can be delayed arbitrarily long during periods of asynchrony, making the causes of things like message omissions subjective and conclusions on them unreliable.

In this work, we propose an approach to subjective fault detection, and evaluate the benefits with respect to the system guarantees. The key insight of my work is that BFT protocols already contain mechanisms that allows them to tolerate faults, and that a fault detector can leverage these mechanisms when the system is in a sticky situation. We first show that being able to detect subjective faults allows a system that is configured with n = 3 f + 1 replicas to retain liveness even when there are just f + 1 correct replicas left. The reconfiguration protocol, Phoenix, also allows a system to make better reconfiguration decisions to replace subjectively faulty replicas. After, we will show that in Blockchain systems, detecting subjective faults can help a client mitigate attacks that are used to make their view of the blockchain inconsistent.