iGent Concert
/ensemble/features/routing/

Routing Engine

How Ensemble routes requests across providers, endpoints, and capacity pools

Routing Decision Flow

Request arrives
     │
     ▼
┌─────────────┐
│ Model lookup │ → Which providers support this model?
└──────┬──────┘
       │
       ▼
┌──────────────────┐
│ Cache affinity   │ → Does any endpoint have cached context for this session?
└──────┬───────────┘
       │
       ▼
┌──────────────────┐
│ Rate limit check │ → Filter out endpoints at capacity
└──────┬───────────┘
       │
       ▼
┌──────────────────┐
│ Cost/load balance│ → Among remaining, pick optimal endpoint
└──────┬───────────┘
       │
       ▼
  Selected endpoint

RoutingDecision

Every request produces a RoutingDecision:

type RoutingDecision struct {
    Provider       string          // "anthropic", "openai", etc.
    Endpoint       string          // Endpoint display name
    EndpointID     string          // Internal endpoint ID
    Reason         string          // Human-readable reason
    EstimatedValue decimal.Decimal // Estimated cache value
    CacheOptimized bool            // Whether cache influenced the decision
    CostPenalty    decimal.Decimal // Cost delta vs cheapest option
}

Routing Strategies

Configurable per provider:

StrategyBehavior
session_affinityPrefer endpoint with session cache (default)
round_robinEqual distribution across endpoints
least_usedRoute to endpoint with lowest utilization
providers:
  anthropic:
    strategy: session_affinity
  openai:
    strategy: least_used

Error-Aware Routing

When a request fails on the selected endpoint:

1. Rate limit (429): Automatically retry on next available endpoint in the capacity pool 2. Server error (5xx): Retry on a different provider if available 3. Permanent error (4xx): Return error immediately (no retry)

Error classification is provider-specific — Ensemble understands the difference between Anthropic's overloaded (retryable) and invalid_api_key (permanent).

Capacity Pools

Multiple API keys for the same provider form a capacity pool:

providers:
  anthropic:
    keys:
      - name: primary
        endpoints:
          - id: anthropic-1
            rpm_limit: 1000
            tpm_limit: 100000
      - name: secondary
        endpoints:
          - id: anthropic-2
            rpm_limit: 500
            tpm_limit: 50000

The router distributes load across the pool and fails over between endpoints.