Qwen 3.5 35B A3B

+28.6%

~$QW35M

Hybrid Mamba-SSM and Transformer architecture with 256 experts, 3B active. 128K context with near-zero KV cache memory. FP8 quantized for maximum throughput.

42.0k stars 2.1M HF downloads

Deploy Now

#13 overall · 2 deploys

1.5s

Cold Start

avg on Ion

215ms

Avg Inference

per request

Active Replicas

right now

Total Deployments

all time

7-Day Trend

+28.6% this week

Cold Start vs Competitors

Lower is better

●~$QW35M1.5s

~$QW3-81.2s

~$QW301.7s

~$QW321.8s

Cold start comparison vs similar models. Lower is better.

Cost Estimate

Cost per 1K Inferences

$0.90

Est. Daily (1K req/day)

$0.90

Est. Monthly (30K req)

$27.00

No subscriptions. Buy credits, pay per inference. Scale to zero when idle.

Quick Deploy

cumulus-sdk

import cumulus from "cumulus-sdk"

// Deploy Qwen 3.5 35B A3B on Ion
const client = await cumulus.deploy("qwen3-5-35b-a3b")

// Run inference
const result = await client.run({
  prompt: "Your prompt here",
  // model-specific params...
})

More in LLMs

View all LLMs models →

Cumulus Labs

Leaderboard Categories Playground

Get Started

Back to Leaderboard

LLMs

Qwen 3.5 35B A3B

+28.6%

~$QW35M

Hybrid Mamba-SSM and Transformer architecture with 256 experts, 3B active. 128K context with near-zero KV cache memory. FP8 quantized for maximum throughput.

42.0k stars 2.1M HF downloads

Deploy Now

#13 overall · 2 deploys

1.5s

Cold Start

avg on Ion

215ms

Avg Inference

per request

Active Replicas

right now

Total Deployments

all time

7-Day Trend

+28.6% this week

Cold Start vs Competitors

Lower is better

●~$QW35M1.5s

~$QW3-81.2s

~$QW301.7s

~$QW321.8s

Cold start comparison vs similar models. Lower is better.

Cost Estimate

Cost per 1K Inferences

$0.90

Est. Daily (1K req/day)

$0.90

Est. Monthly (30K req)

$27.00

No subscriptions. Buy credits, pay per inference. Scale to zero when idle.

Quick Deploy

cumulus-sdk

import cumulus from "cumulus-sdk"

// Deploy Qwen 3.5 35B A3B on Ion
const client = await cumulus.deploy("qwen3-5-35b-a3b")

// Run inference
const result = await client.run({
  prompt: "Your prompt here",
  // model-specific params...
})

More in LLMs

View all LLMs models →