Model family timeline
Last updated Mar 2, 2026
Qwen3.5 model releases
A source-backed timeline for the Qwen3.5 model family, collecting release dates, labs, access details, context windows, and major lifecycle changes.
5 models
Qwen3.5-9B
AvailableThe flagship of Alibaba's small dense Qwen3.5 models. Independent analysis (Artificial Analysis) rated it the most intelligent model under 10B parameters at launch — roughly double the score of the next-closest sub-10B models — and the most intelligent multimodal model under 15B, leading peers on MMMU-Pro (~69%). A dense 9B with native vision, a 262K-token context, and the Qwen3.5 family's unified hybrid thinking / non-thinking mode. Native weights are BF16; in 4-bit it needs ~6GB, within reach of consumer laptops. High intelligence comes with heavy reasoning token usage (~260M output tokens to run the Intelligence Index).
Qwen3.5-4B
AvailableA dense 4B in Alibaba's small Qwen3.5 family, rated by Artificial Analysis as the most intelligent model under 5B parameters at launch — outscoring several 7B–9B peers despite roughly half the parameters. Native vision, a 262K-token context, and the family's hybrid thinking / non-thinking mode; Apache-2.0 licensed. Scores ~65% on MMMU-Pro multimodal reasoning and runs in ~3GB at 4-bit, suitable for lightweight on-device agents.
Qwen3.5-2B
AvailableA dense 2B Qwen3.5 model built for high-throughput, low-latency edge and on-device use. Despite its size it matches a 7B-class peer on Artificial Analysis's Intelligence Index. Apache-2.0, with native vision, a 262K-token context, and the family's hybrid thinking / non-thinking mode; runs in under 2GB at 4-bit, fitting laptops and smartphones.
Qwen3.5-0.8B
AvailableThe smallest Qwen3.5 model — a dense 0.8B designed for the most constrained on-device deployments, operating in non-thinking (instruct) mode by default. Apache-2.0, with native vision, a 262K-token context, and the family's hybrid thinking / non-thinking mode; needs roughly 2GB of VRAM and runs under 2GB at 4-bit, targeting smartphones and embedded hardware. Notable for a sub-1B model, it still scores ~26% on MMMU-Pro multimodal reasoning.
Qwen3.5-397B
AvailableNative vision-language MoE supporting 201 languages with a 1M-token context.