Batch Processing - iGent Concert

Ensemble supports provider batch APIs for asynchronous bulk processing at discounted rates.

Overview

50% cost savings on batch requests at all providers that support batching
Graceful degradation: Queue requests during rate limits instead of immediate failure
Time-windowed batching: Collect requests over a configurable interval (e.g., 30 seconds)

Supported Providers

Provider	Platform	Discount
Anthropic	Direct API	50%
Anthropic	Bedrock	50%
Anthropic	Vertex AI	50%
OpenAI	Direct API	50%
Google Gemini	Direct API	50%

API

POST /api/v1/batch

Submit a batch of requests.

{
  "requests": [
    {
      "custom_id": "task-001",
      "model": "claude-sonnet-4-20250514",
      "messages": [{"role": "user", "content": "Summarize this document..."}],
      "max_tokens": 1024
    },
    {
      "custom_id": "task-002",
      "model": "claude-sonnet-4-20250514",
      "messages": [{"role": "user", "content": "Translate this text..."}],
      "max_tokens": 512
    }
  ]
}

Response:

{
  "batch_id": "batch_abc123",
  "status": "submitted",
  "request_count": 2,
  "created_at": "2025-01-15T10:30:00Z"
}

Rate Limit Fallback

When synchronous endpoints are rate-limited, Ensemble can optionally queue requests for batch processing:

batch:
  rate_limit_fallback: true
  collection_window: 30s
  max_batch_size: 100

This turns 429 errors into deferred processing — the response arrives later but the request is never dropped.