Feature taxonomy
Capabilities by category. Phase badges indicate current availability or roadmap.
MVP
Phase 2
Roadmap
MVP
Inference endpoints
Deploy served models as managed endpoints with configurable scaling and health checks. Scale to zero when idle to minimize cost.
MVP
Autoscaling policies
Define min/max replicas and scaling triggers (RPS, latency, or custom metrics). Scale up for traffic spikes and down during quiet periods.
Phase 2
Multiple model backends
Support for popular inference runtimes and custom containers. Run Triton, vLLM, or your own serving stack with verified blueprints.
Roadmap
A/B and canary rollouts
Route a percentage of traffic to new model versions for safe rollouts. Compare latency and error rates before full cutover.
Compare approaches
Factual comparison of capabilities: InferoFabric vs generic GPU VMs vs DIY Kubernetes. No competitor naming.
| Capability | InferoFabric | Generic GPU VM | DIY Kubernetes |
|---|---|---|---|
| Single control plane across on-prem and cloud | Yes | No | Manual (multi-cluster) |
| Cost caps and auto-suspend | Yes | Limited (billing alerts only) | DIY only |
| Verified stack blueprints | Yes | Marketplace / images | DIY only |
| Hybrid placement (on-prem first, cloud burst) | Yes | No | DIY only |
| Inference autoscaling (scale to zero) | Yes | Per-cloud offerings | DIY (e.g. KEDA) |
| Unified usage and cost attribution | Yes | Per-cloud console | DIY only |
| Operator installer and upgrades | Yes | N/A | DIY only |