~$QW25-7
Fast 7B parameter LLM with strong instruction following. Optimized for high-throughput workloads where speed matters. Ideal for real-time agentic pipelines.
Cold start comparison vs similar models. Lower is better.
No subscriptions. Buy credits, pay per inference. Scale to zero when idle.
import cumulus from "cumulus-sdk" // Deploy Qwen 2.5 7B on Ion const client = await cumulus.deploy("qwen2-5-7b") // Run inference const result = await client.run({ prompt: "Your prompt here", // model-specific params... })