Ensemble is the inference authority in the Fabric stack. Managed dev uses the company-wide staging Ensemble endpoint. Optional local checkouts can run the Go server for local-mode validation.
Legacy REST/catalog documentation still exists for historical clients, but the current service story is Fabric RPC first.
What It Does
- Lists available models through
inference/models.list. - Generates responses through
inference/generate. - Adapts provider APIs behind a common Fabric inference surface.
- Streams generation events when the selected transport supports streaming.
- Reports health for hosted staging and optional local-mode deployments.
Active Providers
Current active adapter coverage includes:
| Provider family | Notes |
|---|---|
| Anthropic | Native Anthropic adapter |
| OpenAI-compatible | OpenAI and compatible APIs |
| Gemini | Google Gemini adapter |
| OpenRouter | Aggregated model provider |
| Mock | Development and validation path |
Provider availability depends on configuration and credentials in the runtime environment.
Runtime
| Aspect | Current shape |
|---|---|
| Language | Go |
| Default endpoint | SSM-hydrated staging ENSEMBLE_URL |
| Local endpoint | 127.0.0.1:8004 only when FABRIC_ENSEMBLE_MODE=local |
| Public host | https://ensemble.fabric.dev.aws.igent.ai retained as compatibility |
| Active route | Native Ensemble API routes |
| Legacy route | ensemble-rest on 127.0.0.1:8007 when ensemble/bin/ensemble.old exists |
Current Limits
The active Fabric adapter supports model list and generation. Status retrieval and cancellation surfaces are intentionally conservative placeholders unless a concrete runtime implementation is enabled.
In The System
Agents normally reach Ensemble as part of runtime work coordinated by Podium. Diminuendo may also broker model information or inference calls when the product surface needs a single gateway entry point.