Run open-source models
at the lowest cost and highest speed.
Open models the way they should always have been: ultra-fast and too cheap to meter.
Backed byCombinatorThe most efficient place to run open models
Cheaper, faster, and fully elastic, so you ship on open weights without managing a single GPU.
Lowest cost
Up to 50% cheaper than typical providers. We serve open weights efficiently and pass the savings straight to you.
Fastest inference
Industry-leading tokens per second on every model, with tuned kernels and warm capacity for low latency.
Pay as you go
Only pay for the tokens you use. No idle GPUs, no seats, no minimums, no commitments.
linear pricing, billed only for tokens used
Scale to infinity
Autoscaling that handles a single request or a million, with no capacity planning on your side.
Every major open-source model, one endpoint
Deploy the latest open weights without managing GPUs. Transparent per-token pricing and benchmarked throughput for each model.
GLM 5.2
Z.ai
Kimi 2.7
Moonshot AI
DeepSeek V4 Pro
DeepSeek
DeepSeek V4 Flash
DeepSeek
Qwen 3.6 35B A3B
Alibaba
Nemotron 3 Ultra
NVIDIA
From zero to inference in minutes
If you can call the OpenAI API, you can run on Scalar. Three steps, no infrastructure to manage.
Pick a model
Choose from every major open model. They all sit behind one OpenAI-compatible endpoint, no GPUs to provision.
Point your code
Swap your base URL and key. Keep your existing OpenAI SDK, prompts and tooling exactly as they are.
import OpenAI from "openai"
const client = new OpenAI({
baseURL: "https://api.scalar.ai/v1",
apiKey: process.env.SCALAR_KEY,
})Scale automatically
We handle batching, warm capacity and autoscaling. You ship, from the first request to millions.
Ship faster on open-source models, for less
Get an API key in minutes. Bring your existing OpenAI code, then point it at Scalar.