Resiliency patterns of distributed systems

September 06, 2020

I have released a new chapter of Understanding Distributed Systems: Resiliency Patterns. The chapter is all about failures and their mitigations.

Any failure that can happen will eventually happen at scale - hardware faults, software crashes, memory leaks - you name it. The more your system scales out, and the more failures it will experience. Eventually, the only way to cope with them is with automated self-healing and defense mechanisms.

The chapter is packed with practical defense mechanisms that have helped the systems I have built to stand the test of time and scale to millions of users. It starts out describing the most common sources of failures, like single point of failures and slow processes, and then goes on to describe various defense mechanisms, like circuit breakers, rate-limiting, and load shedding.


Written by Roberto Vitillo

Want to learn how to build scalable and fault-tolerant cloud applications?

My book explains the core principles of distributed systems that will help you design, build, and maintain cloud applications that scale and don't fall over.

Sign up for the book's newsletter to get the first two chapters delivered straight to your inbox.

    I respect your privacy. Unsubscribe at any time.