~$QW35M
Hybrid Mamba-SSM and Transformer architecture with 256 experts, 3B active. 128K context with near-zero KV cache memory. FP8 quantized for maximum throughput.
Cold start comparison vs similar models. Lower is better.
No subscriptions. Buy credits, pay per inference. Scale to zero when idle.
import cumulus from "cumulus-sdk" // Deploy Qwen 3.5 35B A3B on Ion const client = await cumulus.deploy("qwen3-5-35b-a3b") // Run inference const result = await client.run({ prompt: "Your prompt here", // model-specific params... })