Qwen 2.5 32B

+11.3%

~$QW32

High-capacity 32B language model. Best-in-class for complex instruction following, long-context tasks, and enterprise workflows. Runs on all 5 universal-b nodes.

35.0k stars 6.2M HF downloads

Deploy Now

#7 overall · 5 deploys

1.8s

Cold Start

avg on Ion

420ms

Avg Inference

per request

Active Replicas

right now

Total Deployments

all time

7-Day Trend

+11.3% this week

Cold Start vs Competitors

Lower is better

●~$QW321.8s

~$QW3-81.2s

~$QW301.7s

~$QW141.4s

Cold start comparison vs similar models. Lower is better.

Cost Estimate

Cost per 1K Inferences

$1.60

Est. Daily (1K req/day)

$1.60

Est. Monthly (30K req)

$48.00

No subscriptions. Buy credits, pay per inference. Scale to zero when idle.

Quick Deploy

cumulus-sdk

import cumulus from "cumulus-sdk"

// Deploy Qwen 2.5 32B on Ion
const client = await cumulus.deploy("qwen2-5-32b")

// Run inference
const result = await client.run({
  prompt: "Your prompt here",
  // model-specific params...
})

More in LLMs

DeepSeek R1 14B~$DSR1

1.4s+18.9%

View all LLMs models →

Cumulus Labs

Leaderboard Categories Playground

Get Started

Back to Leaderboard

LLMs

Qwen 2.5 32B

+11.3%

~$QW32

High-capacity 32B language model. Best-in-class for complex instruction following, long-context tasks, and enterprise workflows. Runs on all 5 universal-b nodes.

35.0k stars 6.2M HF downloads

Deploy Now

#7 overall · 5 deploys

1.8s

Cold Start

avg on Ion

420ms

Avg Inference

per request

Active Replicas

right now

Total Deployments

all time

7-Day Trend

+11.3% this week

Cold Start vs Competitors

Lower is better

●~$QW321.8s

~$QW3-81.2s

~$QW301.7s

~$QW141.4s

Cold start comparison vs similar models. Lower is better.

Cost Estimate

Cost per 1K Inferences

$1.60

Est. Daily (1K req/day)

$1.60

Est. Monthly (30K req)

$48.00

No subscriptions. Buy credits, pay per inference. Scale to zero when idle.

Quick Deploy

cumulus-sdk

import cumulus from "cumulus-sdk"

// Deploy Qwen 2.5 32B on Ion
const client = await cumulus.deploy("qwen2-5-32b")

// Run inference
const result = await client.run({
  prompt: "Your prompt here",
  // model-specific params...
})

More in LLMs

DeepSeek R1 14B~$DSR1

1.4s+18.9%

View all LLMs models →