Roadmap
Here. Next. Later.
High-level phases. We version by capability, not by date. Each phase ships when it's ready.
V1
v1 · Shipped
Available today on managed and self-hosted deployments.
- inferenceCache-aware routing across GPU nodes
- runtimeOpenAI-compatible gateway with BYO runtime (LangGraph, Temporal, custom)
- Real meteringReal token + cost attribution, observability dashboards
- Tenant isolationPer-project API keys, rate limits, model ACLs
V2
v2 · Coming Q3 2026
In active development. Available to design partners first.
- stateFull run replay. Every step, tool call, and model call, queryable by one API call
- recoveryCheckpoint-aware resume with KV cache warmup, plus run-level budgets with HTTP 429 enforcement
- Kova Serve runtimeLayer 3 runtime for coding/terminal/tool-heavy agents with sandboxes, workspaces, and artifacts
- Native S D KFirst-class Python and TypeScript SDKs
V3
v3 · Later
On the horizon. Exact sequencing will respond to customer signal.
- Checkpoint branchingFork a run from any checkpoint to explore alternatives
- Tool policiesGovernance over which tools each tenant can invoke
- Cross-cloud portabilitySame semantics across managed, self-hosted, and neocloud deployments
A word on honesty
The Agentic Inference Cloud ships in layers. Inference and Runtime (cache-aware routing, OpenAI-compatible gateway, BYO runtime, real metering, tenant isolation) are live today on our managed deployment and verified by the eval scenarios in our public repo. State (full run replay) and Recovery (checkpoint-aware resume, budget-enforced halts) are labeled “Coming Q3” on the home page, and the animations that preview them are simulated.