Instance Creation
1. Client → Gateway: POST /api/v1/instances {deployment_id, config}
2. Gateway → Valkey: Find least-loaded coordinator
3. Gateway → Coordinator: POST /instances {deployment_id, secrets}
4. Coordinator:
a. ensure_deployment() — download from S3 if missing
b. Create instance directory structure
c. Mount Chronicle FUSE filesystem
d. Spawn agent process (Bun for TS, PyO3 shim for Python)
e. Initialize IPC channel (Unix socket)
5. Coordinator → Gateway: {instance_id, agent_id}
State Machine
Agent instances follow a lifecycle state machine:
┌──────────────────┐
│ Deploying │
└────────┬─────────┘
│
┌────────▼─────────┐
│ Initializing │ onInit() called
└────────┬─────────┘
│
┌────────▼─────────┐
│ Ready │ Waiting for messages
└────────┬─────────┘
│ message arrives
┌────────▼─────────┐
│ Running │ onMessage() executing
└────────┬─────────┘
│ response complete
▼
Ready (loop)
│ idle timeout
┌────────▼─────────┐
│ Scaling Down │ onIdleTimeout() called
└────────┬─────────┘
│
┌────────▼─────────┐
│ Shutting Down │ onShutdown() called, state checkpointed
└────────┬─────────┘
│
┌────────▼─────────┐
│ Terminated │ Process exited, Chronicle unmounted
└──────────────────┘
IPC Protocol
Communication between the coordinator and agent processes uses Unix domain sockets with length-prefixed JSON messages.
Message Types (Coordinator → Agent)
| Type | Description |
|---|---|
init | Initialize with config, state, workspace path |
message | Deliver user message for processing |
config_update | Runtime config change |
steer | Mid-turn guidance |
shutdown | Graceful shutdown signal |
Message Types (Agent → Coordinator)
| Type | Description |
|---|---|
stream_update | Streaming token/content update |
state_delta | State mutation delta (JSON Patch) |
log | Structured log entry |
tool_request | Request tool execution |
complete | Message processing complete |
Agent Configuration
Agents receive configuration at init time and can receive runtime updates:
{
"agent_id": "agent_abc123",
"agent_type": "claude-agent",
"workspace": "/instances/agent_abc123/chronicle/mount",
"ensemble_url": "https://staging-ensemble.example.internal",
"ensemble_api_key": "ens_...",
"model": "claude-sonnet-4-20250514",
"custom_config": { ... }
}
Scale-to-Zero
When an agent is idle beyond the configured timeout:
1. Coordinator calls onIdleTimeout() — agent can return false to defer 2. If allowed, coordinator calls onShutdown() 3. State is checkpointed to SQLite 4. SQLite is replicated to S3 via Litestream 5. Chronicle FUSE filesystem is unmounted 6. Agent process is terminated 7. Instance directory is cleaned up (or preserved for fast restart)
On next request, the agent is rehydrated from S3 state.