Routing Engine - iGent Concert

Routing Decision Flow

Request arrives
     │
     ▼
┌─────────────┐
│ Model lookup │ → Which providers support this model?
└──────┬──────┘
       │
       ▼
┌──────────────────┐
│ Cache affinity   │ → Does any endpoint have cached context for this session?
└──────┬───────────┘
       │
       ▼
┌──────────────────┐
│ Rate limit check │ → Filter out endpoints at capacity
└──────┬───────────┘
       │
       ▼
┌──────────────────┐
│ Cost/load balance│ → Among remaining, pick optimal endpoint
└──────┬───────────┘
       │
       ▼
  Selected endpoint

RoutingDecision

Every request produces a RoutingDecision:

type RoutingDecision struct {
    Provider       string          // "anthropic", "openai", etc.
    Endpoint       string          // Endpoint display name
    EndpointID     string          // Internal endpoint ID
    Reason         string          // Human-readable reason
    EstimatedValue decimal.Decimal // Estimated cache value
    CacheOptimized bool            // Whether cache influenced the decision
    CostPenalty    decimal.Decimal // Cost delta vs cheapest option
}

Routing Strategies

Configurable per provider:

Strategy	Behavior
`session_affinity`	Prefer endpoint with session cache (default)
`round_robin`	Equal distribution across endpoints
`least_used`	Route to endpoint with lowest utilization

providers:
  anthropic:
    strategy: session_affinity
  openai:
    strategy: least_used

Error-Aware Routing

When a request fails on the selected endpoint:

1. Rate limit (429): Automatically retry on next available endpoint in the capacity pool 2. Server error (5xx): Retry on a different provider if available 3. Permanent error (4xx): Return error immediately (no retry)

Error classification is provider-specific — Ensemble understands the difference between Anthropic's overloaded (retryable) and invalid_api_key (permanent).

Capacity Pools

Multiple API keys for the same provider form a capacity pool:

providers:
  anthropic:
    keys:
      - name: primary
        endpoints:
          - id: anthropic-1
            rpm_limit: 1000
            tpm_limit: 100000
      - name: secondary
        endpoints:
          - id: anthropic-2
            rpm_limit: 500
            tpm_limit: 50000

The router distributes load across the pool and fails over between endpoints.