Scalar

Run open-source models
at the lowest cost and highest speed.

Open models the way they should always have been: ultra-fast and too cheap to meter.

Start in 2 minutes View pricing

Backed byCombinator

DeepSeekQwenKimiGLMNemotronMiniMaxMiMoDeepSeekQwenKimiGLMNemotronMiniMaxMiMo

Why Scalar

The most efficient place to run open models

Cheaper, faster, and fully elastic, so you ship on open weights without managing a single GPU.

Lowest cost

Up to 50% cheaper than typical providers. We serve open weights efficiently and pass the savings straight to you.

Typical

Scalar

Fastest inference

Industry-leading tokens per second on every model, with tuned kernels and warm capacity for low latency.

Typical

Scalar

Pay as you go

Only pay for the tokens you use. No idle GPUs, no seats, no minimums, no commitments.

1M tokens$0.39

10M tokens$3.90

100M tokens$39.00

1B tokens$390.00

linear pricing, billed only for tokens used

Scale to infinity

Autoscaling that handles a single request or a million, with no capacity planning on your side.

Models & pricing

Every major open-source model, one endpoint

Deploy the latest open weights without managing GPUs. Transparent per-token pricing and benchmarked throughput for each model.

GLM 5.2

Z.ai

Kimi 2.7

Moonshot AI

DeepSeek V4 Pro

DeepSeek

DeepSeek V4 Flash

DeepSeek

Qwen 3.6 35B A3B

Alibaba

Nemotron 3 Ultra

NVIDIA

How it works

From zero to inference in minutes

If you can call the OpenAI API, you can run on Scalar. Three steps, no infrastructure to manage.

Pick a model

Choose from every major open model. They all sit behind one OpenAI-compatible endpoint, no GPUs to provision.

DeepSeek V4 Pro

GLM 5.2

Kimi 2.7

Qwen 3.6

Point your code

Swap your base URL and key. Keep your existing OpenAI SDK, prompts and tooling exactly as they are.

import OpenAI from "openai"

const client = new OpenAI({
  baseURL: "https://api.scalar.ai/v1",
  apiKey: process.env.SCALAR_KEY,
})

Scale automatically

We handle batching, warm capacity and autoscaling. You ship, from the first request to millions.

Ship faster on open-source models, for less

Get an API key in minutes. Bring your existing OpenAI code, then point it at Scalar.

Get started free Talk to sales

Run open-source models at the lowest cost and highest speed.

The most efficient place to run open models

Lowest cost

Fastest inference

Pay as you go

Scale to infinity

Every major open-source model, one endpoint

GLM 5.2

Kimi 2.7

DeepSeek V4 Pro

DeepSeek V4 Flash

Qwen 3.6 35B A3B

Nemotron 3 Ultra

From zero to inference in minutes

Pick a model

Point your code

Scale automatically

Ship faster on open-source models, for less

Run open-source models
at the lowest cost and highest speed.