Configuration - iGent Concert

Ensemble is configured via a YAML file passed with --config. All operational tuning happens here — no code changes required.

Config Structure

server:
  port: "8080"
  read_timeout: 35m       class="token comment"># Long for reasoning models (o3, GPT-5.2)
  write_timeout: 35m
  idle_timeout: 120s
  coalescence_window: 50ms  class="token comment"># Batch streaming tokens (0 = disabled)

redis:
  address: "localhost:6379"
  username: ""
  password: ""
  database: 0
  pool_size: 10
  max_retries: 3
  dial_timeout: 5s

database:
  path: "./data/ensemble.db"
  max_open_conns: 25
  max_idle_conns: 5

cache:
  enable_session_affinity: true
  max_cache_entries: 100000
  crc_algorithm: crc32
  cache_wait_threshold: 0.25   class="token comment"># >$0.25 estimated value: strong affinity
  load_balance_threshold: 0.05 class="token comment"># <$0.05: prefer least-utilized endpoint
  ttls:
    anthropic: 8m
    openai: 24h
    gemini: 6m

rate_limit:
  window_size: 1m
  ttl_seconds: 65
  sync_interval: 1s        class="token comment"># Background Redis sync interval
  default_rpm: 1000
  default_tpm: 1000000
  redis_eval_timeout: 50ms
  redis_rollback_timeout: 25ms

class="token comment"># Provider configs are in separate files under config/providers/
class="token comment"># anthropic.yaml, openai.yaml, gemini.yaml, xai.yaml, openrouter.yaml,
class="token comment"># bedrock.yaml, vertex.yaml, fireworks.yaml, self-hosted-*.yaml

class="token comment"># Per-model streaming timeouts (stall vs overall)
streaming_timeouts:
  "o1":
    stall_timeout: 20m
    overall_timeout: 25m
  "o3":
    stall_timeout: 20m
    overall_timeout: 30m
  "gpt-5":
    stall_timeout: 20m
    overall_timeout: 30m

class="token comment"># Provider HTTP client timeouts
provider_timeouts:
  default: 60s
  bedrock: 90s
  api_call_default: 15m
  api_call_extended: 50m  class="token comment"># GPT-5.2 Pro reasoning

class="token comment"># YAML-driven parameter validation per model pattern
parameter_validation:
  enable: true
  model_drop_rules:
    "gpt-5": ["temperature"]
    "o1": ["temperature", "top_p"]
  conditional_rules:
    "claude*opus*":
      - if_parameter: "temperature"
        drop_parameters: ["top_p"]

Key Config Sections

ServerConfig

Field	Type	Default	Description
`port`	string	`"8080"`	Listen port
`read_timeout`	duration	`35m`	HTTP read timeout (long for reasoning models)
`write_timeout`	duration	`35m`	HTTP write timeout
`coalescence_window`	duration	`50ms`	Token batching window (0 = disabled)

ProviderConfig

Field	Type	Description
`name`	string	Display name
`type`	string	Provider type: `anthropic`, `anthropic-bedrock`, `vertex`, `openai`, `gemini`, `openrouter`
`strategy`	string	`session_affinity`, `round_robin`, `least_used`
`models`	string[]	Supported model names
`pricing`	PricingConfig	Per-million token pricing
`keys`	ProviderKey[]	API keys and their endpoints
`allowed_headers`	string[]	Whitelisted per-request headers
`allowed_server_tools`	ServerTool[]	Whitelisted server-side tools

EndpointConfig

Field	Type	Description
`id`	string	Unique endpoint identifier
`base_url`	string	Provider API base URL
`rpm_limit`	int	Requests per minute limit
`tpm_limit`	int	Tokens per minute limit

Environment Variables

Variable	Description
`ENSEMBLE_ENCRYPTION_KEY`	AES key for encrypting stored provider API keys
`ENSEMBLE_ADMIN_KEY`	Admin API authentication key
`ENSEMBLE_ENVIRONMENT`	Environment name (dev/staging/production) for Redis namespace
`REDIS_NAMESPACE`	Explicit Redis key namespace
`AWS_REGION`	S3 region for response persistence
`AWS_ENDPOINT_URL`	S3 endpoint (for MinIO)

Hot Reload

The configuration supports hot-reload via the ConfigManager. Changes to the YAML file are detected and applied without restart for most settings (provider configs, pricing, timeouts).