Resiliency patterns of distributed systems

September 06, 2020

I have released a new chapter of the Distributed Systems Manual: Resiliency Patterns. The chapter is all about failures and their mitigations.

Any failure that can happen will eventually happen at scale - hardware faults, software crashes, memory leaks - you name it. The more your system scales out, and the more failures it will experience. Eventually, the only way to cope with them is with automated self-healing and defense mechanisms.

The chapter is packed with practical defense mechanisms that have helped the systems I have built to stand the test of time and scale to millions of users. It starts out describing the most common sources of failures, like single point of failures and slow processes, and then goes on to describe various defense mechanisms, like circuit breakers, rate-limiting, and load shedding.

Written by Roberto Vitillo

I am writing a book about distributed systems

Do you write applications that make network calls? If so, congratulations - you are a distributed systems engineer! My book teaches the core principles of distributed systems that will help you design, build, and maintain scalable cloud applications that will stand the test of time.

Want a sneak peek? Subscribe to receive a sample and be notified when a new chapter is released.

    I respect your privacy. Unsubscribe at any time.