Core use cases
Inference at scale
ML platform and product teams who need to serve models in production with predictable latency and cost.
Benefits
- Managed inference endpoints with health checks and rolling updates
- Autoscaling (including scale-to-zero) to minimize idle cost
- Low idle cost: pay only when traffic is served
- Multiple backends and verified stack blueprints (vLLM, Triton, custom)
Related features
Training jobs
Data science and ML teams running distributed training with the need for reliability and resource fairness.
Benefits
- Checkpoint durability: persist to object storage and resume from last state
- Quotas and fair-share scheduling so teams get predictable capacity
- Preemption-aware scheduling (roadmap): use spot/preemptible with automatic resume
- Single control plane for both inference and training workloads
Related features
Hybrid enterprise
Enterprises that need data locality, region/zone controls, and on-prem-first with cloud burst.
Benefits
- Data locality: place workloads where your data lives (on-prem or specific region)
- Region and zone controls via placement policies
- On-prem first: use your GPU clusters before bursting to cloud
- Single pane of glass across all environments
Related features
Blueprint spotlight
Verified stacks you can deploy in minutes. Pre-validated hardware targets and runtime versions.
vLLM inference
High-throughput LLM serving with PagedAttention and continuous batching.
Hardware target
NVIDIA GPU (A100, H100, L4, T4)
Runtime versions
- vLLM 0.4.x
- CUDA 12.x
- Python 3.10+
Triton Inference Server
Multi-framework inference (TensorRT, ONNX, PyTorch) with dynamic batching.
Hardware target
NVIDIA GPU (Ampere or newer)
Runtime versions
- Triton 2.40+
- CUDA 12.x
- cuDNN 8.x
ComfyUI
Stable Diffusion and image generation workflows with node-based UI.
Hardware target
NVIDIA GPU (8GB+ VRAM)
Runtime versions
- ComfyUI latest
- PyTorch 2.x
- CUDA 12.x
Whisper transcription
OpenAI Whisper for speech-to-text at scale with batch and streaming.
Hardware target
NVIDIA GPU (T4, L4, A10)
Runtime versions
- Whisper (large-v3)
- faster-whisper / CTranslate2
- Python 3.10+