Pricing
Pay per run. Not per layer.
Usage-based pricing on the Agentic Inference Cloud. Cache savings pass through to you, not us. Every tier includes the full bundle of inference, runtime, state, and recovery.
Free
$0
forever
Solo devs and early-stage teams.
- 10K runs/month
- Full feature set
- Community support
- 1 project, 1 environment
Startup
Usage-based
from $0.004/1K tokens
Shipping to production.
- Unmetered runs
- Cache discount on repeated prefixes
- Email support
- Unlimited projects + environments
Growth
Volume + SLA
custom
Scaled teams.
- Slack channel support
- Priority routing
- 99.9% SLA
- Custom rate limits
- SSO
Self-Hosted
Flat annual
custom
Code-sensitive / sovereign.
- Your infra, same semantics
- Air-gapped option
- Kubernetes or bare-metal
- Dedicated onboarding
Frequently asked
Questions buyers actually ask.
You do. KovaServe meters every request against real vLLM prefix-cache counters. When a prefix hits the cache, your inference cost drops, and that drop shows up as a line on your invoice, not ours.
How you start
Three steps. Most teams ship within the hour.
1
Sign up
Create a free account. Credit card, no sales call.
2
Change your base URL
One-line swap in any OpenAI-compatible client. You're now calling the Agentic Inference Cloud.
python
base_url = "https://api.kovaserve.ai/v1"3
Ship
Every call is now a durable run. Inference, runtime, state, and recovery, bundled. Cache savings and cost attribution come automatically.