Model family timeline

Last updated Mar 2, 2026

Qwen3.5 model releases

A source-backed timeline for the Qwen3.5 model family, collecting release dates, labs, access details, context windows, and major lifecycle changes.

Models

Labs

Open

Recent

All models·Latest releases·Coding models·Reasoning models·Changelog

Most recent in this set

Mar 2, 2026

Qwen3.5-0.8B

Qwen

Sort

5 models

Qwen3.5-9B

Available

Alibaba (Qwen)Open source

The flagship of Alibaba's small dense Qwen3.5 models. Independent analysis (Artificial Analysis) rated it the most intelligent model under 10B parameters at launch — roughly double the score of the next-closest sub-10B models — and the most intelligent multimodal model under 15B, leading peers on MMMU-Pro (~69%). A dense 9B with native vision, a 262K-token context, and the Qwen3.5 family's unified hybrid thinking / non-thinking mode. Native weights are BF16; in 4-bit it needs ~6GB, within reach of consumer laptops. High intelligence comes with heavy reasoning token usage (~260M output tokens to run the Intelligence Index).

Dense9B262K ctxMar 2, 2026

Qwen3.5-4B

Available

Alibaba (Qwen)Open source

A dense 4B in Alibaba's small Qwen3.5 family, rated by Artificial Analysis as the most intelligent model under 5B parameters at launch — outscoring several 7B–9B peers despite roughly half the parameters. Native vision, a 262K-token context, and the family's hybrid thinking / non-thinking mode; Apache-2.0 licensed. Scores ~65% on MMMU-Pro multimodal reasoning and runs in ~3GB at 4-bit, suitable for lightweight on-device agents.

Dense4B262K ctxMar 2, 2026

Qwen3.5-2B

Available

Alibaba (Qwen)Open source

A dense 2B Qwen3.5 model built for high-throughput, low-latency edge and on-device use. Despite its size it matches a 7B-class peer on Artificial Analysis's Intelligence Index. Apache-2.0, with native vision, a 262K-token context, and the family's hybrid thinking / non-thinking mode; runs in under 2GB at 4-bit, fitting laptops and smartphones.

Dense2B262K ctxMar 2, 2026

Qwen3.5-0.8B

Available

Alibaba (Qwen)Open source

The smallest Qwen3.5 model — a dense 0.8B designed for the most constrained on-device deployments, operating in non-thinking (instruct) mode by default. Apache-2.0, with native vision, a 262K-token context, and the family's hybrid thinking / non-thinking mode; needs roughly 2GB of VRAM and runs under 2GB at 4-bit, targeting smartphones and embedded hardware. Notable for a sub-1B model, it still scores ~26% on MMMU-Pro multimodal reasoning.

Dense0.8B262K ctxMar 2, 2026