Ensemble keeps provider-specific details behind a Fabric inference boundary. The active path is intentionally small: receive a Fabric RPC call, validate it, choose the provider adapter, execute the request, and return or stream a normalized result.
Active Request Path
Fabric client or gateway
|
| POST /rpc
v
Ensemble Fabric server
|
+--> capability/model registry
+--> provider adapter selection
+--> generation execution
+--> normalized Fabric response/events
Components
| Component | Job |
|---|---|
| RPC server | Accepts Fabric JSON-RPC requests on /rpc |
| Model registry | Reports configured model availability and capabilities |
| Provider adapters | Translate Fabric inference requests to provider APIs |
| Streaming layer | Emits generation deltas and terminal events where supported |
| Health/observability | Exposes service health and telemetry for the root stack |
Legacy Compatibility
Some older docs and clients refer to /api/v1/generate, /api/v1/stream, /api/v1/models, and response-persistence endpoints. In the current dev stack those belong to the optional ensemble-rest compatibility service, which only starts when ensemble/bin/ensemble.old exists.
New Fabric work should target /rpc and the inference/* method family.