KovaServeKovaServe
Roadmap

Here. Next. Later.

High-level phases. We version by capability, not by date. Each phase ships when it's ready.

V1

v1 · Shipped

Available today on managed and self-hosted deployments.

  • inference
    Cache-aware routing across GPU nodes
  • runtime
    OpenAI-compatible gateway with BYO runtime (LangGraph, Temporal, custom)
  • Real metering
    Real token + cost attribution, observability dashboards
  • Tenant isolation
    Per-project API keys, rate limits, model ACLs
V2

v2 · Coming Q3 2026

In active development. Available to design partners first.

  • state
    Full run replay. Every step, tool call, and model call, queryable by one API call
  • recovery
    Checkpoint-aware resume with KV cache warmup, plus run-level budgets with HTTP 429 enforcement
  • Kova Serve runtime
    Layer 3 runtime for coding/terminal/tool-heavy agents with sandboxes, workspaces, and artifacts
  • Native S D K
    First-class Python and TypeScript SDKs
V3

v3 · Later

On the horizon. Exact sequencing will respond to customer signal.

  • Checkpoint branching
    Fork a run from any checkpoint to explore alternatives
  • Tool policies
    Governance over which tools each tenant can invoke
  • Cross-cloud portability
    Same semantics across managed, self-hosted, and neocloud deployments
A word on honesty

The Agentic Inference Cloud ships in layers. Inference and Runtime (cache-aware routing, OpenAI-compatible gateway, BYO runtime, real metering, tenant isolation) are live today on our managed deployment and verified by the eval scenarios in our public repo. State (full run replay) and Recovery (checkpoint-aware resume, budget-enforced halts) are labeled “Coming Q3” on the home page, and the animations that preview them are simulated.