Pricing

Pay per run. Not per layer.

Usage-based pricing on the Agentic Inference Cloud. Cache savings pass through to you, not us. Every tier includes the full bundle of inference, runtime, state, and recovery.

Free

forever

Solo devs and early-stage teams.

10K runs/month
Full feature set
Community support
1 project, 1 environment

Start free

Startup

Usage-based

from $0.004/1K tokens

Shipping to production.

Unmetered runs
Cache discount on repeated prefixes
Email support
Unlimited projects + environments

Start free, upgrade later

Growth

Volume + SLA

custom

Scaled teams.

Slack channel support
Priority routing
99.9% SLA
Custom rate limits
SSO

Talk to sales

Self-Hosted

Flat annual

custom

Code-sensitive / sovereign.

Your infra, same semantics
Air-gapped option
Kubernetes or bare-metal
Dedicated onboarding

Book a pilot

Frequently asked

Questions buyers actually ask.

You do. KovaServe meters every request against real vLLM prefix-cache counters. When a prefix hits the cache, your inference cost drops, and that drop shows up as a line on your invoice, not ours.

How you start

Three steps. Most teams ship within the hour.

Sign up

Create a free account. Credit card, no sales call.

Change your base URL

One-line swap in any OpenAI-compatible client. You're now calling the Agentic Inference Cloud.

python

base_url = "https://api.kovaserve.ai/v1"

Ship

Every call is now a durable run. Inference, runtime, state, and recovery, bundled. Cache savings and cost attribution come automatically.

Start free Read the docs