Chaos Engineering Overview

Principle of Chaos Engieering

Discipline of experimenting on system to build confidence in system's capability to withstand turbulent conditions in production.

Modern large-scale software systems are complex with many components and services functioning in a distributed system. Interactiosn between services can cause unpredictable outcomes that affect production environments.

Weaknesses in system need to be tested for improper fallback settings, unavailable services, outages from traffic overload, cascading failures from single point of failure and many more. Rigorous testing will measure stability of complex system in production deployment and areas to improve and deal with potential chaos.

Practising Chaos

Define 'steady state' of measurable output of system indicating normal behaviour
Hypothesise steady state in control and experiment group
Introduce vairables of real world events like service failure, network overloading etc.
Disprove hypothesis by viewing difference in steady state between control and experiment group

The more difficult it is to disrupt the steady state, there are more confidence in the system's resilience.