ChaosKit

A modular Go framework for chaos engineering, fault injection, and reliability testing of distributed systems, libraries, and services.

Go Report Card Go Reference Coverage

GitHub Documentation

Overview

ChaosKit enables systematic testing of system reliability and resilience through controlled fault injection and invariant validation.

Important:

ChaosKit is designed for proactive chaos engineering — it works best when integrated into your code from the start. Most injection methods require adding chaos hooks to your code (except ToxiProxy for network chaos).

Philosophy

Build resilient systems by designing them to be testable with chaos from day one, rather than retrofitting chaos testing into existing code.

When to Use

New libraries and frameworks
Workflow engine testing
Rollback mechanism validation
Resource leak detection
Network resilience testing

Key Capabilities

Powerful tools for testing the reliability of your systems

Controlled Chaos Injection

Introduce delays, panics, resource pressure, and network faults to test system resilience.

Multiple Injection Methods

Context-based, monkey patching, failpoints, and network proxies for various testing scenarios.

Invariant Validation

Verify system properties: bounded recursion, absence of infinite loops, resource leak prevention.

Continuous Testing

Run long-duration stress tests to discover edge cases and hidden issues.

Comprehensive Metrics

Automatic collection of execution statistics and performance data for analysis.

Flexible Configuration

Customize behavior through functional options pattern and builder pattern for convenient usage.

Injection Methods

Choose the right method for your use case

1. Context-Based Injection (Recommended)

Recommended for new code

Explicitly call chaos functions in your code. Fine-grained control, low overhead, works in production.

Requires code changes

2. Failpoint Injection (Recommended)

Recommended for production

Compile-time injection points. Production-safe (compiles to no-op without tag).

Requires code changes

3. ToxiProxy (Recommended)

Recommended for network chaos

Proxy network connections. Works without code changes, real network conditions.

No code changes

4. Monkey Patching

Limited use cases

Runtime function replacement. Only works with package-level function variables.

Requires -gcflags=all=-l

Comparison Table

Method	Code Changes	Works with Existing Code	Production-Safe	Performance	Recommendation
Context-Based	Required	No	Yes	Excellent	New projects
Failpoints	Required	No	Yes (no-op)	Excellent	Production
ToxiProxy	None	Yes	Yes	Good	Network
Monkey Patch	Specific	Rarely	Never	Poor	Avoid

Core Components

Modular architecture for flexible testing

Chaos Injectors

DelayInjector - random delays
PanicInjector - random panics
CPUInjector - CPU stress
MemoryInjector - memory pressure
ToxiProxy Injectors - network chaos
CompositeInjector - combine injectors

Validators

PanicRecoveryValidator - recovery validation
RecursionDepthValidator - recursion depth
GoroutineLeakValidator - goroutine leaks
SlowIterationValidator - slow iterations
ExecutionTimeValidator - execution time
MemoryLimitValidator - memory limits

Quick Start

Get started with ChaosKit in minutes

package main

import (
    "context"
    "log"
    "time"

    "github.com/rom8726/chaoskit"
    "github.com/rom8726/chaoskit/injectors"
    "github.com/rom8726/chaoskit/validators"
)

// Define your system under test
type WorkflowEngine struct{}

func (w *WorkflowEngine) Name() string { return "workflow-engine" }
func (w *WorkflowEngine) Setup(ctx context.Context) error { return nil }
func (w *WorkflowEngine) Teardown(ctx context.Context) error { return nil }

// Define execution step with context-based chaos
func ExecuteWorkflow(ctx context.Context, target chaoskit.Target) error {
    // Inject chaos directly in your code
    if chaoskit.MaybePanic(ctx) {
        panic("chaos: intentional panic")
    }
    
    chaoskit.MaybeDelay(ctx) // May add delay here
    
    // Your workflow execution logic
    time.Sleep(10 * time.Millisecond)
    return nil
}

func main() {
    engine := &WorkflowEngine{}

    scenario := chaoskit.NewScenario("reliability-test").
        WithTarget(engine).
        Step("execute-workflow", ExecuteWorkflow).
        Inject("delay", injectors.RandomDelay(5*time.Millisecond, 25*time.Millisecond)).
        Inject("panic", injectors.PanicProbability(0.01)).
        Assert("goroutine-limit", validators.GoroutineLimit(200)).
        Assert("recursion-depth", validators.RecursionDepthLimit(100)).
        Assert("no-infinite-loop", validators.NoInfiniteLoop(5*time.Second)).
        Repeat(100).
        Build()

    if err := chaoskit.Run(context.Background(), scenario); err != nil {
        log.Fatalf("Scenario execution failed: %v", err)
    }
}

Installation

go get github.com/rom8726/chaoskit

Requires Go 1.25 or later.

Examples

simple - basic reliability testing
continuous - continuous testing
chaos_context - context-based injection
monkey_patch - monkey patching
toxiproxy - network chaos
floxy_stress_test - comprehensive stress test

Usage Patterns

Different ways to use ChaosKit for reliability testing

Basic Reliability Testing

scenario := chaoskit.NewScenario("basic-test").
    WithTarget(workflowEngine).
    Step("execute", ExecuteWorkflow).
    Assert("no-leaks", validators.GoroutineLimit(100)).
    Repeat(1000).
    Build()

chaoskit.Run(ctx, scenario)

Continuous Testing

scenario := chaoskit.NewScenario("continuous-test").
    WithTarget(engine).
    Step("execute", ExecuteWorkflow).
    Inject("chaos", injectors.CompositeInjector(...)).
    Assert("stability", validators.GoroutineLimit(500)).
    RunFor(24*time.Hour).
    Build()

Load Testing with Validation

scenario := chaoskit.NewScenario("load-test").
    WithTarget(engine).
    Step("execute", ExecuteWorkflow).
    Inject("cpu-stress", injectors.NewCPUInjector(4, 500*time.Millisecond)).
    Assert("performance", validators.NewExecutionTimeValidator(100*time.Millisecond)).
    Repeat(10000).
    Build()