LLM Releases

Self-hostable models

Last updated May 26, 2026

Local LLM releases

Downloadable models that are plausible candidates for local or small-cluster use, emphasizing compact total size or low active-parameter MoE designs.

104
Models
34
Labs
104
Open
4
Recent

104 models

MiniMax-M2.7

Available
MiniMaxFrontierOpen weights

Open-weight agentic model from MiniMax focused on real-world software engineering, office tasks, tool use, and self-improving training workflows.

MoE229.9B ctxMay 26, 2026

Qwen3.6-27B

Available
Alibaba (Qwen)Open source

Dense 27B that punches far above its weight on agentic coding — easy to self-host on a single GPU node.

Dense27B256K ctxMay 12, 2026

DeepSeek V4-Flash

Preview
DeepSeekOpen source

Efficient V4 companion model with 284B total / 13B active parameters and the same one-million-token context window.

MoE284B1M ctxApr 24, 2026

Hunyuan-A13B-Instruct

Available
Tencent HunyuanOpen weights

Tencent Hunyuan open-weight fine-grained MoE model with 80B total parameters and 13B active parameters, optimized for agentic tool use.

MoE80B ctxApr 22, 2026

Gemma 4 31B

Available
Google DeepMindOpen source

Google DeepMind's Gemma 4 advanced-reasoning open model for personal computers, part of the April 2026 Gemma 4 family.

Dense31B ctxApr 2, 2026

Nemotron 3 Super 120B-A12B

Available
NVIDIAFrontierOpen weights

Open-weight hybrid Mamba-Transformer MoE designed for collaborative agents and high-volume enterprise workflows.

Hybrid120B1M ctxMar 16, 2026

Step-3.5-Flash

Available
StepFunOpen source

StepFun's Apache-licensed sparse MoE model for fast agentic execution, coding, math, browsing, and tool-use workflows.

MoE196B256K ctxMar 14, 2026

Sarvam-105B

Available
Sarvam AIOpen source

Apache-licensed Indian-context MoE from Sarvam AI, optimized for reasoning, coding, agentic tasks, and 22 Indian languages.

MoE105B128K ctxMar 6, 2026

Qwen3-Coder-Next

Available
Alibaba (Qwen)Open source

Apache-licensed Qwen3-Next coding-agent model with 80B total / 3B active parameters, 256K context, and long-horizon tool-use training.

Hybrid80B262K ctxFeb 3, 2026

OLMo 3 Think 32B

Available
Allen Institute for AI (Ai2)Open source

Ai2's fully open thinking model with public weights, code, data, checkpoints, and training details across the OLMo 3 pipeline.

Dense32B ctxDec 15, 2025

Nemotron 3 Nano 30B-A3B

Available
NVIDIAOpen weights

Efficient Nemotron 3 MoE checkpoint for agentic reasoning and coding, activating about 3B parameters while supporting 1M-token contexts.

Hybrid30B1M ctxDec 15, 2025

LFM2 1.2B

Available
Liquid AIOpen weights

Liquid AI hybrid model for efficient CPU/GPU/NPU local deployment, using short convolutions plus attention blocks.

Hybrid1.17B33K ctxNov 28, 2025

Kimi-Linear-48B-A3B-Instruct

Available
Moonshot AIOpen source

MIT-licensed hybrid linear-attention model using Kimi Delta Attention, built for million-token contexts with much lower KV-cache usage.

Hybrid48B1.0M ctxOct 31, 2025

Gemma 3 27B

Available
Google DeepMindOpen weights

Google's open multimodal model: 128k context, 140+ languages, runs on a single GPU.

Dense27B128K ctxSep 4, 2025

Seed-OSS-36B-Instruct

Available
ByteDance SeedOpen source

ByteDance Seed's Apache-licensed long-context reasoning and agent model, with controllable thinking budgets and a native 512K context.

Dense36B512K ctxAug 20, 2025

GLM-4.5V

Available
Z.ai (Zhipu AI)Open source

Vision-language GLM based on GLM-4.5-Air, covering image, video, document, grounding, and GUI-agent tasks.

MoE106B ctxAug 11, 2025

gpt-oss-20b

Available
OpenAIOpen source

Smaller gpt-oss reasoning model optimized for local inference on systems with about 16GB of memory.

MoE21B128K ctxAug 5, 2025

gpt-oss-120b

Available
OpenAIOpen source

OpenAI's larger open-weight reasoning model, a 117B-total / 5.1B-active MoE with 128K context for local and self-hosted deployment.

MoE117B128K ctxAug 5, 2025

Falcon-H1 34B

Available
Technology Innovation InstituteOpen weights

A hybrid attention + state-space-model (SSM) design that matches 70B-class models with fewer parameters.

Hybrid34B256K ctxJul 31, 2025

GLM-4.5-Air

Available
Z.ai (Zhipu AI)Open source

Compact GLM-4.5 companion with 106B total / 12B active parameters for efficient agentic reasoning and coding.

MoE106B128K ctxJul 28, 2025

EXAONE 4.0 32B

Available
LG AI ResearchOpen weights

LG AI Research's unified model with non-reasoning and reasoning modes, agentic tool use, and English, Korean, and Spanish support.

Dense32B ctxJul 15, 2025

SmolLM3 3B

Available
Hugging FaceOpen source

Hugging Face's fully open 3B multilingual long-context model with optional reasoning mode and 128K context.

Dense3B128K ctxJul 8, 2025

Kimi-VL-A3B-Thinking-2506

Available
Moonshot AIOpen source

Updated MIT-licensed Kimi-VL reasoning model with better multimodal reasoning, video understanding, high-resolution perception, and lower thinking-token use.

MoE16B128K ctxJun 21, 2025

Kimi-Dev-72B

Available
Moonshot AIOpen source

MIT-licensed coding LLM trained with repository-level reinforcement learning for software issue resolution.

Dense73B ctxJun 17, 2025

Magistral Small

Available
Mistral AIOpen weights

Open-weight 24B reasoning model from Mistral's Magistral family, popular for local reasoning experiments.

Dense24B40K ctxJun 10, 2025

Phi-4 Reasoning

Available
MicrosoftOpen weights

Phi-4 reasoning-specialized model family for math, science, and chain-of-thought style tasks.

Dense14B ctxApr 30, 2025

Granite 3.3 8B

Available
IBMOpen source

Granite 3.3 text update for enterprise chat, RAG, and instruction-following workflows.

Dense8B128K ctxApr 30, 2025

Kimi-Audio-7B-Instruct

Available
Moonshot AIOpen source

Open audio foundation model for audio understanding, generation, speech recognition, audio QA, captioning, and speech conversation.

Hybrid10B ctxApr 25, 2025

Kimi-VL-A3B-Instruct

Available
Moonshot AIOpen source

Efficient MIT-licensed vision-language MoE for OCR, image/video understanding, long documents, and OS-style agent tasks.

MoE16B128K ctxApr 17, 2025

Llama-3.3-Nemotron-Super-49B

Available
NVIDIAOpen weights

Open Llama Nemotron reasoning model from NVIDIA's 2025 Nemotron family.

Dense49B128K ctxApr 2, 2025

Qwen2.5-Omni-7B

Available
Alibaba (Qwen)Open weights

Local omni-modal Qwen model that supports text, image, audio, video, and speech generation in a 7B package.

Dense7B ctxMar 26, 2025

Mistral Small 3.1

Available
Mistral AIOpen source

Apache-licensed Small update adding vision and a 128K context window to the efficient 24B line.

Dense24B128K ctxMar 17, 2025

OLMo 2 32B

Available
Allen Institute for AI (Ai2)Open source

A fully open model — weights, data, and training code all public — and the first such to beat GPT-3.5 / GPT-4o mini.

Dense32B4K ctxMar 13, 2025

Granite 3.2 8B

Available
IBMOpen source

Granite 3.2 update with reasoning controls and multimodal/document-oriented Granite variants.

Dense8B128K ctxFeb 26, 2025

Moonlight-16B-A3B-Instruct

Available
Moonshot AIOpen source

MIT-licensed 16B/3B-active MoE trained with Moonshot's scalable Muon optimizer experiments.

MoE16B8K ctxFeb 24, 2025

DeepHermes 3 Llama 3 8B

Available
Nous ResearchOpen weights

Nous reasoning-oriented Hermes model trained to combine concise answers with optional deep reasoning traces.

Dense8B8K ctxFeb 18, 2025

Dolphin 3.0 Llama 3.1 8B

Available
Cognitive ComputationsOpen weights

Popular local assistant model tuned for coding, math, function calling, and agentic workflows.

Dense8B128K ctxFeb 2, 2025

Mistral Small 3

Available
Mistral AIOpen source

A latency-optimized 24B dense model under Apache-2.0 — a popular local-deployment workhorse.

Dense24B32K ctxJan 30, 2025

Qwen2.5-VL-72B

Available
Alibaba (Qwen)Open weights

Vision-language Qwen2.5 model for image, document, video, and agentic visual grounding tasks.

Dense72B128K ctxJan 26, 2025

Granite 3.1 8B

Available
IBMOpen source

IBM's enterprise-focused open model with a 128k context, Apache-2.0 licensed.

Dense8B128K ctxDec 18, 2024

Falcon 3 10B

Available
Technology Innovation InstituteOpen weights

UAE's TII open model designed to run on light infrastructure, including laptops.

Dense10B32K ctxDec 17, 2024

Command R7B

Available
CohereOpen weights

Cohere's smallest, fastest R-series model, tuned for RAG and tool use on modest hardware.

Dense8B128K ctxDec 13, 2024

Phi-4

Available
MicrosoftOpen source

A 14B dense model that rivals far larger ones on math and reasoning, under a permissive MIT license.

Dense14B16K ctxDec 12, 2024

EXAONE 3.5 32B

Available
LG AI ResearchOpen weights

EXAONE 3.5 32B open-weight model for bilingual reasoning, coding, and long-context tasks.

Dense32B32K ctxDec 9, 2024

Llama 3.3 70B

Available
Meta AIOpen weights

Late-2024 70B Llama update delivering much of the 405B instruction-following quality at lower serving cost.

Dense70B128K ctxDec 6, 2024

QwQ-32B-Preview

Available
Alibaba (Qwen)Open source

Qwen's first public reasoning-preview model, aimed at math, coding, and deliberate problem solving.

Dense32B32K ctxNov 28, 2024

Qwen2.5-Coder-32B

Available
Alibaba (Qwen)Open source

Code-specialized Qwen2.5 model family, with the 32B checkpoint as the flagship open coding model.

Dense32B128K ctxNov 12, 2024

SmolLM2 1.7B

Available
Hugging FaceOpen source

Compact on-device model family trained on 11T tokens, popular for lightweight local chat and experimentation.

Dense1.7B ctxNov 4, 2024

Sarvam-1

Available
Sarvam AIOpen weights

Sarvam's 2B open model trained for ten major Indian languages.

Dense2B ctxOct 22, 2024

Granite 3.0 8B

Available
IBMOpen source

Apache-licensed Granite 3.0 text model, part of IBM's push toward enterprise-friendly open models.

Dense8B4K ctxOct 21, 2024

Llama-3.1-Nemotron-70B

Available
NVIDIAOpen weights

NVIDIA-tuned Llama 3.1 70B instruction model optimized with Nemotron reward and alignment recipes.

Dense70B128K ctxOct 15, 2024

Molmo 72B

Available
Allen Institute for AI (Ai2)Open weights

Open multimodal model family trained for strong image understanding, pointing, and visual grounding.

Dense72B ctxSep 25, 2024

Qwen2.5-72B

Available
Alibaba (Qwen)Open weights

Broad Qwen2.5 foundation-model update spanning general, coding, math, and multimodal descendants.

Dense72B128K ctxSep 19, 2024

Pixtral 12B

Available
Mistral AIOpen source

Mistral's first open multimodal model, adding image understanding to a Mistral text backbone.

Dense12B128K ctxSep 17, 2024

Yi-Coder-9B

Available
01.AIOpen weights

01.AI's compact code model trained for repository-scale programming and code completion tasks.

Dense9B128K ctxSep 5, 2024

OLMoE 1B-7B

Available
Allen Institute for AI (Ai2)Open source

Fully open sparse MoE model with 7B total and about 1B active parameters.

MoE7B ctxSep 3, 2024

Phi-3.5 MoE

Available
MicrosoftOpen weights

Phi-3.5 mixture-of-experts model, scaling Microsoft's small-model line while preserving efficient active parameters.

MoE42B128K ctxAug 20, 2024

EXAONE 3.0 7.8B

Available
LG AI ResearchOpen weights

LG's first open-weight EXAONE model, a compact bilingual instruction model for Korean and English.

Dense7.8B ctxAug 7, 2024

MiniCPM-V 2.6

Available
OpenBMBOpen weights

8B vision-language model for local image, multi-image, OCR, and video understanding, with llama.cpp and Ollama support.

Dense8B ctxAug 2, 2024

Mistral NeMo

Available
Mistral AIOpen source

Apache-licensed 12B model co-developed with NVIDIA, including a 128K context window and strong multilingual tokenization.

Dense12B128K ctxJul 18, 2024

Gemma 2 27B

Available
Google DeepMindOpen weights

Second-generation Gemma model, improving open-weight quality and efficiency at 9B and 27B sizes.

Dense27B8K ctxJun 27, 2024

Qwen2-72B

Available
Alibaba (Qwen)Open weights

Qwen2's largest dense model, introducing stronger multilingual support, coding/math gains, and long-context variants.

Dense72B128K ctxJun 7, 2024

GLM-4-9B

Available
Z.ai (Zhipu AI)Open weights

Open GLM-4 9B model family, covering chat, long-context, and code-oriented variants.

Dense9B128K ctxJun 5, 2024

Codestral 22B

Available
Mistral AIOpen weights

Mistral's first code-specialized model, trained for code generation, fill-in-the-middle, and multi-language programming tasks.

Dense22B32K ctxMay 29, 2024

Aya 23 35B

Available
CohereOpen weights

Open multilingual research model covering 23 languages, released by Cohere For AI.

Dense35B ctxMay 23, 2024

Yi-1.5-34B

Available
01.AIOpen weights

Yi 1.5 update with stronger instruction following, coding, math, and multilingual performance.

Dense34B4K ctxMay 13, 2024

Falcon 2 11B

Available
Technology Innovation InstituteOpen weights

Falcon 2 generation, including text and vision-language 11B models under a permissive TII license.

Dense11B8K ctxMay 13, 2024

Granite Code 34B

Available
IBMOpen source

Apache-2.0 code model from IBM's Granite Code family, used for local code generation and enterprise coding assistants.

Dense34B8K ctxMay 6, 2024

Phi-3 Mini

Available
MicrosoftOpen weights

3.8B-parameter Phi-3 model released as a phone-capable small model with 4K and 128K variants.

Dense3.8B128K ctxApr 23, 2024

Llama 3 70B

Available
Meta AIOpen weights

First Llama 3 release, with 8B and 70B open models and a stronger tokenizer, data mix, and post-training stack.

Dense70B8K ctxApr 18, 2024

CodeGemma 7B

Available
Google DeepMindOpen weights

Open code-specialized Gemma model for local code completion, generation, and instruction-following.

Dense7B8K ctxApr 9, 2024

Jamba

Available
AI21 LabsOpen weights

First Jamba hybrid Transformer-Mamba MoE model with open weights and a 256K context length.

Hybrid52B256K ctxMar 28, 2024

StarCoder2 15B

Available
BigCodeOpen weights

Next-generation BigCode code model trained on 4T+ tokens and 600+ programming languages, with 16K context.

Dense16B16K ctxFeb 28, 2024

Gemma 7B

Available
Google DeepMindOpen weights

First Gemma open-weight text model family, derived from the same research lineage as Gemini.

Dense7B8K ctxFeb 21, 2024

OLMo 7B

Available
Allen Institute for AI (Ai2)Open source

Ai2's first fully open language model release, including weights, training data, code, logs, and intermediate checkpoints.

Dense7B4K ctxFeb 1, 2024

Stable LM 2 1.6B

Available
Stability AIOpen weights

Small multilingual Stable LM release built for low hardware barriers and local experimentation.

Dense1.6B ctxJan 19, 2024

DeepSeekMoE 16B

Available
DeepSeekOpen source

Early DeepSeek sparse MoE research model that foreshadowed the later V2/V3 architecture direction.

MoE16B4K ctxJan 11, 2024

Nous Hermes 2 Mixtral

Available
Nous ResearchOpen source

Nous instruction-tuned Mixtral model with strong open-chat and tool-use adoption.

MoE47B32K ctxJan 11, 2024

OpenChat 3.5

Available
OpenChatOpen source

Compact Mistral-based local chat model trained with C-RLFT, popular in early 2024 local leaderboards.

Dense7B ctxJan 6, 2024

TinyLlama 1.1B Chat

Available
TinyLlamaOpen source

Compact Llama-style 1.1B chat model trained for local experimentation and low-memory deployments.

Dense1.1B ctxJan 1, 2024

Phi-2

Available
MicrosoftOpen weights

2.7B-parameter Phi model showing strong reasoning and language understanding at small scale.

Dense2.7B ctxDec 12, 2023

OpenHathi-7B

Available
Sarvam AIOpen weights

Sarvam AI's first open Indic language model, adapted from Llama 2 for Hindi and Indian-language work.

Dense7B ctxDec 12, 2023

Mixtral 8x7B

Available
Mistral AIOpen source

The open sparse Mixture-of-Experts that brought MoE efficiency to the open ecosystem.

MoE47B32K ctxDec 11, 2023

Qwen-72B

Available
Alibaba (Qwen)Open weights

Alibaba's first major open Qwen model and the start of a prolific open-weight line.

Dense72B32K ctxNov 30, 2023

DeepSeek LLM 67B

Available
DeepSeekOpen source

First general DeepSeek language model family, with 7B and 67B base/chat checkpoints.

Dense67B4K ctxNov 29, 2023

Yi-34B

Available
01.AIOpen weights

01.AI's strong bilingual open model, with a 200k-context variant.

Dense34B200K ctxNov 6, 2023

DeepSeek Coder 33B

Available
DeepSeekOpen source

DeepSeek's first public code-model family, released before the general DeepSeek LLM line.

Dense33B16K ctxNov 2, 2023

LLaVA 1.5 13B

Available
LLaVAOpen weights

Open vision-language assistant and one of the most widely run early local multimodal models.

Hybrid13B ctxSep 30, 2023

Mistral 7B

Available
Mistral AIOpen source

The 7B that punched far above its weight and put Mistral on the map.

Dense7B8K ctxSep 27, 2023

Qwen-14B

Available
Alibaba (Qwen)Open weights

Second open Qwen size, expanding the first-generation Qwen language-model lineup.

Dense14B8K ctxSep 25, 2023

Granite 13B

Available
IBMOpen weights

IBM's early Granite foundation model family for enterprise language and code tasks.

Dense13B ctxSep 7, 2023

Code Llama 34B

Available
Meta AIOpen weights

Meta's first code-specialized Llama model family, released in base, Python, and instruction-tuned variants.

Dense34B16K ctxAug 24, 2023

Qwen-7B

Available
Alibaba (Qwen)Open weights

Alibaba's first open Qwen checkpoint and the start of the Qwen open-model line.

Dense7B32K ctxAug 3, 2023

Nous-Hermes-Llama2-13B

Available
Nous ResearchOpen weights

Early Nous Hermes instruction model on Llama 2, widely used in the open-model fine-tuning ecosystem.

Dense13B4K ctxJul 24, 2023

Llama 2 70B

Available
Meta AIOpen weights

The release that made capable open-weight models genuinely usable for production.

Dense70B4K ctxJul 18, 2023

ChatGLM2-6B

Available
Z.ai (Zhipu AI)Open weights

Second open ChatGLM generation, improving long context, inference efficiency, and bilingual chat quality.

Dense6B32K ctxJun 25, 2023

Phi-1

Available
MicrosoftOpen weights

Microsoft's first Phi small-language-model release, demonstrating strong code performance from textbook-quality synthetic data.

Dense1.3B ctxJun 21, 2023

Falcon 40B

Available
Technology Innovation InstituteOpen weights

TII's breakout open Falcon model, released before Falcon 180B and trained on the RefinedWeb corpus.

Dense40B2K ctxMay 25, 2023

MPT-7B

Available
Databricks / MosaicMLOpen source

MosaicML's permissively licensed 7B model, an early favorite for commercial local fine-tuning and long-context variants.

Dense7B2K ctxMay 5, 2023

Vicuna 13B

Available
LMSYS / SkyLabOpen weights

LMSYS instruction-tuned LLaMA model that became a landmark early local ChatGPT-style assistant.

Dense13B ctxMar 30, 2023

ChatGLM-6B

Available
Z.ai (Zhipu AI)Open weights

Zhipu AI and Tsinghua KEG's first widely used open bilingual ChatGLM checkpoint.

Dense6B2K ctxMar 14, 2023

LLaMA

Available
Meta AIOpen weights

Meta's first LLaMA, released to researchers; its leak catalyzed the open-weight movement.

Dense65B2K ctxFeb 24, 2023

GPT-2

Available
OpenAIOpen source

Initially withheld over misuse fears, then fully released in Nov 2019 — an early 'limited release' debate.

Dense1.5B1K ctxNov 5, 2019

BERT

Available
Google DeepMindOpen source

The bidirectional encoder that reshaped NLP and seeded the transformer era.

Dense0.34B512 ctxOct 11, 2018