LLM Releases

Model family timeline

Last updated Mar 2, 2026

Qwen3.5 model releases

A source-backed timeline for the Qwen3.5 model family, collecting release dates, labs, access details, context windows, and major lifecycle changes.

5
Models
1
Labs
5
Open
4
Recent

5 models

Qwen3.5-9B

Available
Alibaba (Qwen)Open source

The flagship of Alibaba's small dense Qwen3.5 models. Independent analysis (Artificial Analysis) rated it the most intelligent model under 10B parameters at launch — roughly double the score of the next-closest sub-10B models — and the most intelligent multimodal model under 15B, leading peers on MMMU-Pro (~69%). A dense 9B with native vision, a 262K-token context, and the Qwen3.5 family's unified hybrid thinking / non-thinking mode. Native weights are BF16; in 4-bit it needs ~6GB, within reach of consumer laptops. High intelligence comes with heavy reasoning token usage (~260M output tokens to run the Intelligence Index).

Dense9B262K ctxMar 2, 2026

Qwen3.5-4B

Available
Alibaba (Qwen)Open source

A dense 4B in Alibaba's small Qwen3.5 family, rated by Artificial Analysis as the most intelligent model under 5B parameters at launch — outscoring several 7B–9B peers despite roughly half the parameters. Native vision, a 262K-token context, and the family's hybrid thinking / non-thinking mode; Apache-2.0 licensed. Scores ~65% on MMMU-Pro multimodal reasoning and runs in ~3GB at 4-bit, suitable for lightweight on-device agents.

Dense4B262K ctxMar 2, 2026

Qwen3.5-2B

Available
Alibaba (Qwen)Open source

A dense 2B Qwen3.5 model built for high-throughput, low-latency edge and on-device use. Despite its size it matches a 7B-class peer on Artificial Analysis's Intelligence Index. Apache-2.0, with native vision, a 262K-token context, and the family's hybrid thinking / non-thinking mode; runs in under 2GB at 4-bit, fitting laptops and smartphones.

Dense2B262K ctxMar 2, 2026

Qwen3.5-0.8B

Available
Alibaba (Qwen)Open source

The smallest Qwen3.5 model — a dense 0.8B designed for the most constrained on-device deployments, operating in non-thinking (instruct) mode by default. Apache-2.0, with native vision, a 262K-token context, and the family's hybrid thinking / non-thinking mode; needs roughly 2GB of VRAM and runs under 2GB at 4-bit, targeting smartphones and embedded hardware. Notable for a sub-1B model, it still scores ~26% on MMMU-Pro multimodal reasoning.

Dense0.8B262K ctxMar 2, 2026

Qwen3.5-397B

Available
Alibaba (Qwen)FrontierOpen source

Native vision-language MoE supporting 201 languages with a 1M-token context.

MoE397B1M ctxFeb 20, 2026