Roadmap

Here. Next. Later.

High-level phases. We version by capability, not by date. Each phase ships when it's ready.

v1 · Shipped

Available today on managed and self-hosted deployments.

inference
Cache-aware routing across GPU nodes
runtime
OpenAI-compatible gateway with BYO runtime (LangGraph, Temporal, custom)
Real metering
Real token + cost attribution, observability dashboards
Tenant isolation
Per-project API keys, rate limits, model ACLs

v2 · Coming Q3 2026

In active development. Available to design partners first.

state
Full run replay. Every step, tool call, and model call, queryable by one API call
recovery
Checkpoint-aware resume with KV cache warmup, plus run-level budgets with HTTP 429 enforcement
Kova Serve runtime
Layer 3 runtime for coding/terminal/tool-heavy agents with sandboxes, workspaces, and artifacts
Native S D K
First-class Python and TypeScript SDKs

v3 · Later

On the horizon. Exact sequencing will respond to customer signal.

Checkpoint branching
Fork a run from any checkpoint to explore alternatives
Tool policies
Governance over which tools each tenant can invoke
Cross-cloud portability
Same semantics across managed, self-hosted, and neocloud deployments

A word on honesty

The Agentic Inference Cloud ships in layers. Inference and Runtime (cache-aware routing, OpenAI-compatible gateway, BYO runtime, real metering, tenant isolation) are live today on our managed deployment and verified by the eval scenarios in our public repo. State (full run replay) and Recovery (checkpoint-aware resume, budget-enforced halts) are labeled “Coming Q3” on the home page, and the animations that preview them are simulated.