Scalar

Run open-source models at the lowest cost and highest speed.

Open models the way they should always have been: ultra-fast and too cheap to meter.

Backed byCombinator
DeepSeekQwenKimiGLMNemotronMiniMaxMiMoDeepSeekQwenKimiGLMNemotronMiniMaxMiMo
Why Scalar

The most efficient place to run open models

Cheaper, faster, and fully elastic, so you ship on open weights without managing a single GPU.

Lowest cost

Up to 50% cheaper than typical providers. We serve open weights efficiently and pass the savings straight to you.

Typical
Scalar

Fastest inference

Industry-leading tokens per second on every model, with tuned kernels and warm capacity for low latency.

Typical
Scalar

Pay as you go

Only pay for the tokens you use. No idle GPUs, no seats, no minimums, no commitments.

1M tokens$0.39
10M tokens$3.90
100M tokens$39.00
1B tokens$390.00

linear pricing, billed only for tokens used

Scale to infinity

Autoscaling that handles a single request or a million, with no capacity planning on your side.

Models & pricing

Every major open-source model, one endpoint

Deploy the latest open weights without managing GPUs. Transparent per-token pricing and benchmarked throughput for each model.

How it works

From zero to inference in minutes

If you can call the OpenAI API, you can run on Scalar. Three steps, no infrastructure to manage.

01

Pick a model

Choose from every major open model. They all sit behind one OpenAI-compatible endpoint, no GPUs to provision.

DeepSeek V4 Pro
GLM 5.2
Kimi 2.7
Qwen 3.6
02

Point your code

Swap your base URL and key. Keep your existing OpenAI SDK, prompts and tooling exactly as they are.

import OpenAI from "openai"

const client = new OpenAI({
  baseURL: "https://api.scalar.ai/v1",
  apiKey: process.env.SCALAR_KEY,
})
03

Scale automatically

We handle batching, warm capacity and autoscaling. You ship, from the first request to millions.

Ship faster on open-source models, for less

Get an API key in minutes. Bring your existing OpenAI code, then point it at Scalar.