ChaosKit
A modular Go framework for chaos engineering, fault injection, and reliability testing of distributed systems, libraries, and services.
Overview
ChaosKit enables systematic testing of system reliability and resilience through controlled fault injection and invariant validation.
ChaosKit is designed for proactive chaos engineering — it works best when integrated into your code from the start. Most injection methods require adding chaos hooks to your code (except ToxiProxy for network chaos).
Philosophy
Build resilient systems by designing them to be testable with chaos from day one, rather than retrofitting chaos testing into existing code.
When to Use
- New libraries and frameworks
- Workflow engine testing
- Rollback mechanism validation
- Resource leak detection
- Network resilience testing
Key Capabilities
Powerful tools for testing the reliability of your systems
Controlled Chaos Injection
Introduce delays, panics, resource pressure, and network faults to test system resilience.
Multiple Injection Methods
Context-based, monkey patching, failpoints, and network proxies for various testing scenarios.
Invariant Validation
Verify system properties: bounded recursion, absence of infinite loops, resource leak prevention.
Continuous Testing
Run long-duration stress tests to discover edge cases and hidden issues.
Comprehensive Metrics
Automatic collection of execution statistics and performance data for analysis.
Flexible Configuration
Customize behavior through functional options pattern and builder pattern for convenient usage.
Injection Methods
Choose the right method for your use case
1. Context-Based Injection (Recommended)
Recommended for new code
Explicitly call chaos functions in your code. Fine-grained control, low overhead, works in production.
Requires code changes
2. Failpoint Injection (Recommended)
Recommended for production
Compile-time injection points. Production-safe (compiles to no-op without tag).
Requires code changes
3. ToxiProxy (Recommended)
Recommended for network chaos
Proxy network connections. Works without code changes, real network conditions.
No code changes
4. Monkey Patching
Limited use cases
Runtime function replacement. Only works with package-level function variables.
Requires -gcflags=all=-l
Comparison Table
| Method | Code Changes | Works with Existing Code | Production-Safe | Performance | Recommendation |
|---|---|---|---|---|---|
| Context-Based | Required | No | Yes | Excellent | New projects |
| Failpoints | Required | No | Yes (no-op) | Excellent | Production |
| ToxiProxy | None | Yes | Yes | Good | Network |
| Monkey Patch | Specific | Rarely | Never | Poor | Avoid |
Core Components
Modular architecture for flexible testing
Chaos Injectors
- DelayInjector - random delays
- PanicInjector - random panics
- CPUInjector - CPU stress
- MemoryInjector - memory pressure
- ToxiProxy Injectors - network chaos
- CompositeInjector - combine injectors
Validators
- PanicRecoveryValidator - recovery validation
- RecursionDepthValidator - recursion depth
- GoroutineLeakValidator - goroutine leaks
- SlowIterationValidator - slow iterations
- ExecutionTimeValidator - execution time
- MemoryLimitValidator - memory limits
Quick Start
Get started with ChaosKit in minutes
package main
import (
"context"
"log"
"time"
"github.com/rom8726/chaoskit"
"github.com/rom8726/chaoskit/injectors"
"github.com/rom8726/chaoskit/validators"
)
// Define your system under test
type WorkflowEngine struct{}
func (w *WorkflowEngine) Name() string { return "workflow-engine" }
func (w *WorkflowEngine) Setup(ctx context.Context) error { return nil }
func (w *WorkflowEngine) Teardown(ctx context.Context) error { return nil }
// Define execution step with context-based chaos
func ExecuteWorkflow(ctx context.Context, target chaoskit.Target) error {
// Inject chaos directly in your code
if chaoskit.MaybePanic(ctx) {
panic("chaos: intentional panic")
}
chaoskit.MaybeDelay(ctx) // May add delay here
// Your workflow execution logic
time.Sleep(10 * time.Millisecond)
return nil
}
func main() {
engine := &WorkflowEngine{}
scenario := chaoskit.NewScenario("reliability-test").
WithTarget(engine).
Step("execute-workflow", ExecuteWorkflow).
Inject("delay", injectors.RandomDelay(5*time.Millisecond, 25*time.Millisecond)).
Inject("panic", injectors.PanicProbability(0.01)).
Assert("goroutine-limit", validators.GoroutineLimit(200)).
Assert("recursion-depth", validators.RecursionDepthLimit(100)).
Assert("no-infinite-loop", validators.NoInfiniteLoop(5*time.Second)).
Repeat(100).
Build()
if err := chaoskit.Run(context.Background(), scenario); err != nil {
log.Fatalf("Scenario execution failed: %v", err)
}
}
Installation
go get github.com/rom8726/chaoskit
Requires Go 1.25 or later.
Examples
- simple - basic reliability testing
- continuous - continuous testing
- chaos_context - context-based injection
- monkey_patch - monkey patching
- toxiproxy - network chaos
- floxy_stress_test - comprehensive stress test
Usage Patterns
Different ways to use ChaosKit for reliability testing
Basic Reliability Testing
scenario := chaoskit.NewScenario("basic-test").
WithTarget(workflowEngine).
Step("execute", ExecuteWorkflow).
Assert("no-leaks", validators.GoroutineLimit(100)).
Repeat(1000).
Build()
chaoskit.Run(ctx, scenario)
Continuous Testing
scenario := chaoskit.NewScenario("continuous-test").
WithTarget(engine).
Step("execute", ExecuteWorkflow).
Inject("chaos", injectors.CompositeInjector(...)).
Assert("stability", validators.GoroutineLimit(500)).
RunFor(24*time.Hour).
Build()
Load Testing with Validation
scenario := chaoskit.NewScenario("load-test").
WithTarget(engine).
Step("execute", ExecuteWorkflow).
Inject("cpu-stress", injectors.NewCPUInjector(4, 500*time.Millisecond)).
Assert("performance", validators.NewExecutionTimeValidator(100*time.Millisecond)).
Repeat(10000).
Build()