/ensemble/configuration/
Configuration
YAML configuration reference for the Ensemble inference gateway
Ensemble is configured via a YAML file passed with --config. All operational tuning happens here — no code changes required.
Config Structure
server:
port: "8080"
read_timeout: 35m class="token comment"># Long for reasoning models (o3, GPT-5.2)
write_timeout: 35m
idle_timeout: 120s
coalescence_window: 50ms class="token comment"># Batch streaming tokens (0 = disabled)
redis:
address: "localhost:6379"
username: ""
password: ""
database: 0
pool_size: 10
max_retries: 3
dial_timeout: 5s
database:
path: "./data/ensemble.db"
max_open_conns: 25
max_idle_conns: 5
cache:
enable_session_affinity: true
max_cache_entries: 100000
crc_algorithm: crc32
cache_wait_threshold: 0.25 class="token comment"># >$0.25 estimated value: strong affinity
load_balance_threshold: 0.05 class="token comment"># <$0.05: prefer least-utilized endpoint
ttls:
anthropic: 8m
openai: 24h
gemini: 6m
rate_limit:
window_size: 1m
ttl_seconds: 65
sync_interval: 1s class="token comment"># Background Redis sync interval
default_rpm: 1000
default_tpm: 1000000
redis_eval_timeout: 50ms
redis_rollback_timeout: 25ms
class="token comment"># Provider configs are in separate files under config/providers/
class="token comment"># anthropic.yaml, openai.yaml, gemini.yaml, xai.yaml, openrouter.yaml,
class="token comment"># bedrock.yaml, vertex.yaml, fireworks.yaml, self-hosted-*.yaml
class="token comment"># Per-model streaming timeouts (stall vs overall)
streaming_timeouts:
"o1":
stall_timeout: 20m
overall_timeout: 25m
"o3":
stall_timeout: 20m
overall_timeout: 30m
"gpt-5":
stall_timeout: 20m
overall_timeout: 30m
class="token comment"># Provider HTTP client timeouts
provider_timeouts:
default: 60s
bedrock: 90s
api_call_default: 15m
api_call_extended: 50m class="token comment"># GPT-5.2 Pro reasoning
class="token comment"># YAML-driven parameter validation per model pattern
parameter_validation:
enable: true
model_drop_rules:
"gpt-5": ["temperature"]
"o1": ["temperature", "top_p"]
conditional_rules:
"claude*opus*":
- if_parameter: "temperature"
drop_parameters: ["top_p"]
Key Config Sections
ServerConfig
| Field | Type | Default | Description |
|---|
port | string | "8080" | Listen port |
read_timeout | duration | 35m | HTTP read timeout (long for reasoning models) |
write_timeout | duration | 35m | HTTP write timeout |
coalescence_window | duration | 50ms | Token batching window (0 = disabled) |
ProviderConfig
| Field | Type | Description |
|---|
name | string | Display name |
type | string | Provider type: anthropic, anthropic-bedrock, vertex, openai, gemini, openrouter |
strategy | string | session_affinity, round_robin, least_used |
models | string[] | Supported model names |
pricing | PricingConfig | Per-million token pricing |
keys | ProviderKey[] | API keys and their endpoints |
allowed_headers | string[] | Whitelisted per-request headers |
allowed_server_tools | ServerTool[] | Whitelisted server-side tools |
EndpointConfig
| Field | Type | Description |
|---|
id | string | Unique endpoint identifier |
base_url | string | Provider API base URL |
rpm_limit | int | Requests per minute limit |
tpm_limit | int | Tokens per minute limit |
Environment Variables
| Variable | Description |
|---|
ENSEMBLE_ENCRYPTION_KEY | AES key for encrypting stored provider API keys |
ENSEMBLE_ADMIN_KEY | Admin API authentication key |
ENSEMBLE_ENVIRONMENT | Environment name (dev/staging/production) for Redis namespace |
REDIS_NAMESPACE | Explicit Redis key namespace |
AWS_REGION | S3 region for response persistence |
AWS_ENDPOINT_URL | S3 endpoint (for MinIO) |
Hot Reload
The configuration supports hot-reload via the ConfigManager. Changes to the YAML file are detected and applied without restart for most settings (provider configs, pricing, timeouts).