Getting Started with FailSafe
FailSafe is a chaos engineering control system that enables you to test system resilience through controlled fault injection across backend, frontend, and mobile platforms.
What is FailSafe?
FailSafe is an adaptive chaos engineering platform designed to systematically test your system's resilience. It enables controlled fault injection with real-time monitoring and automatic rollback capabilities.
How It Works
FailSafe operates through a systematic experiment lifecycle that ensures safe and controlled chaos injection:
Baseline Phase
System metrics are collected to establish normal behavior patterns before any faults are injected.
Injecting Phase
Faults are progressively injected according to the configured intensity model. In adaptive mode, intensity adjusts based on system response.
Recovering Phase
Faults are removed and the system is monitored to ensure it returns to baseline behavior.
Completed Phase
Experiment results are compiled and resilience scores are calculated based on system behavior.
Your First Experiment
Follow these steps to run your first chaos experiment with FailSafe:
1. Configure your environment
Ensure Docker is running for backend tests, or configure the metrics collector for frontend tests.
docker ps # Verify Docker is running2. Create an API key
Generate an API key from the Settings page to authenticate your requests.
curl -X POST http://localhost:8000/api-keys \
-H "Content-Type: application/json" \
-d '{"name": "my-key", "role": "engineer"}'3. Start an experiment
Use the API or dashboard to create and start your first experiment.
curl -X POST http://localhost:8000/experiments/backend/start \
-H "x-api-key: fs_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"targets": ["api-service"],
"fault_type": "latency",
"duration": 60,
"adaptive": true
}'