Ensemble supports provider batch APIs for asynchronous bulk processing at discounted rates.
Overview
- 50% cost savings on batch requests at all providers that support batching
- Graceful degradation: Queue requests during rate limits instead of immediate failure
- Time-windowed batching: Collect requests over a configurable interval (e.g., 30 seconds)
Supported Providers
| Provider | Platform | Discount |
|---|---|---|
| Anthropic | Direct API | 50% |
| Anthropic | Bedrock | 50% |
| Anthropic | Vertex AI | 50% |
| OpenAI | Direct API | 50% |
| Google Gemini | Direct API | 50% |
API
POST /api/v1/batch
Submit a batch of requests.
{
"requests": [
{
"custom_id": "task-001",
"model": "claude-sonnet-4-20250514",
"messages": [{"role": "user", "content": "Summarize this document..."}],
"max_tokens": 1024
},
{
"custom_id": "task-002",
"model": "claude-sonnet-4-20250514",
"messages": [{"role": "user", "content": "Translate this text..."}],
"max_tokens": 512
}
]
}
Response:
{
"batch_id": "batch_abc123",
"status": "submitted",
"request_count": 2,
"created_at": "2025-01-15T10:30:00Z"
}
Rate Limit Fallback
When synchronous endpoints are rate-limited, Ensemble can optionally queue requests for batch processing:
batch:
rate_limit_fallback: true
collection_window: 30s
max_batch_size: 100
This turns 429 errors into deferred processing — the response arrives later but the request is never dropped.