Every large language model release, sourced and tracked over time.

—Undisc.1M ctxJul 24, 2026

Anthropic's flagship Opus model, released July 24, 2026 — positioned as the go-to model for most knowledge work and automation, approaching the capability of Claude Fable 5 in many categories at roughly half the price. Built for demanding reasoning, autonomous coding, software development, and long-horizon agentic work. Introduces a five-level 'effort' dial exposed to developers on the Claude API and Platform, letting them trade compute and tokens for capability — at lower effort it preserves much of its performance while using fewer tokens and costing less to run. 1M-token context window (available at standard token pricing, not a separate long-context surcharge) with up to 128K output tokens; text, vision, and code. Standard API pricing $5/$25 per Mtok (the same as its predecessor Opus 4.8), plus a Fast mode at $10/$50. Anthropic describes it as its most aligned Opus model and the least susceptible to being tricked into misuse. Becomes the default model for Claude Max subscribers and is available across Anthropic's paid plans; scores 61 on the Artificial Analysis Intelligence Index. Closed weights; architecture, parameter count, and training compute undisclosed.

Ling-3.0-flash

Ant Group (inclusionAI)Open weights

Ant Group's efficiency-focused Mixture-of-Experts model, released July 23 2026 by its inclusionAI lab: 124B total parameters activating only ~5.1B per token (1/64 expert activation). Ant claims it matches or beats the company's own ~1T-parameter Ling-2.6 flagship on most benchmarks it shows, at 1/8 the total and 1/12 the active parameters — a vendor claim with no public benchmark table or independent audit at launch, so treat it as unverified. Built for production-scale agents (MCP tool use, multi-agent coordination) rather than chat, with both thinking and non-thinking modes. Architecture is a native hybrid-linear attention stack interleaving Kimi Delta Attention (KDA) and Multi-head Latent Attention (MLA) at a reported 5:1 ratio, giving an economical 262,144-token (256K) context, with 1M cited as the scaling target. Ant docs claim peak inference up to 1,000 tokens/s and <100ms time-to-first-token on its own stack. Announced as open-weight under Apache 2.0, but as of July 24 no weights or model card were posted to the inclusionAI Hugging Face org — so the license and open-weight status are announced but unconfirmed (weights not yet downloadable; not self-hostable today). Usable now only via hosted API — free on OpenRouter (as inclusionai/ling-3.0-flash:free, hosted by Novita) and Vercel's AI Gateway through August 3 2026; no post-promo per-token price published at launch.

MoE124B262K ctxJul 23, 2026

Gemini 3.5 Flash Cyber

—Undisc.1M ctxJul 21, 2026

A specialized, highly efficient cybersecurity model built on Gemini 3.5 Flash and fine-tuned to find and fix software vulnerabilities at a lower price per token than larger models. Deployed inside Google's CodeMender agent, where multiple 3.5 Flash Cyber agents collaborate to produce a single combined report, reaching competitive frontier performance on the CyberGym benchmark. Given its dual-use nature, it is not generally available: access is limited to governments and trusted partners via CodeMender as part of a limited-access pilot program.

Gemini 3.5 Flash-Lite

—Undisc.1M ctxJul 21, 2026

Google DeepMind's fastest and most cost-effective 3.5-class model, released July 21, 2026 for low-latency and high-throughput agentic workloads like agentic search and document processing. Runs at ~350 output tokens/s (Artificial Analysis) with configurable thinking levels and built-in computer use, priced at $0.30 / $2.50 per 1M input/output tokens. Multimodal over a 1M-token context and a large step up on 3.1 Flash-Lite: Terminal-Bench 2.1 54% (vs 31%), GDM-MRCR v2 72.2% (vs 60.1%), GDPval-AA v2 1140 (vs 642); on several agentic and coding evals it even surpasses 3 Flash (SWE-Bench Pro 54.2% vs 49.6%, OSWorld-Verified 74.0% vs 65.1%). Available in the Gemini API (AI Studio, Android Studio), Gemini Enterprise, the Gemini app, and rolling out in Google Search.

Gemini 3.6 Flash

—Undisc.1M ctxJul 21, 2026

Google DeepMind's July 2026 workhorse Flash model, built for scaling agentic workflows. Multimodal over a 1M-token context, it improves on Gemini 3.5 Flash in coding, knowledge work, and computer use while cutting output-token usage ~17% (up to 65% on some benchmarks like DeepSWE) and taking fewer reasoning steps and tool calls. Ships at a lower price than 3.5 Flash ($1.50 / $7.50 per 1M input/output tokens). Google-reported gains: DeepSWE 49% (vs 37%), MLE-Bench 63.9% (vs 49.7%), OSWorld-Verified 83.0% (vs 78.4%), GDPval-AA v2 1421 (vs 1349); knowledge cutoff advances to March 2026. Computer use is a built-in client-side tool. Available in the Gemini API (AI Studio, Android Studio, Antigravity), Gemini Enterprise, and the Gemini app.

Laguna S 2.1

MoE118B1M ctxJul 21, 2026

PoolsideOpen source

Poolside's open-weight agentic-coding model and a scale-up of the Laguna XS family (same pre-training data as XS 2.1): a 118B-total / 8B-active Mixture-of-Experts that activates only ~6.8% of its parameters per token, giving larger-model behavior while staying cheap to serve, with a 1M-token context in both thinking and no-thinking modes. Pitched by Poolside as 'the West's most capable open-weight model' — the claim is about its weight class, not the outright frontier. Two modes (off / max, max default; the model sets its own test-time compute budget). Vendor-reported: Terminal-Bench 2.1 70.2% and SWE-bench Multilingual 78.5% (tops the published open disclosed-size table), plus SWE-bench Pro 59.4%, DeepSWE v1.1 40.4%, SWE Atlas 46.2%, Toolathlon Verified 49.7% — matching or beating models several times its size, though closed frontier models still lead outright. Trained in under nine weeks on 4,096 NVIDIA H200 GPUs (pre-training began 22 May 2026); first Poolside model with RL in FP8. Knowledge cutoff November 2025. Weights on Hugging Face under the permissive OpenMDW-1.1 license in BF16/FP8/INT4/NVFP4 with GGUF/MLX conversions and DFlash draft models; at 4-bit it runs on a single NVIDIA DGX Spark. Day-one support for vLLM, SGLang, and Ollama; hosted free at 256K context via OpenRouter and paid at the full 1M context ($0.10 / $0.20 / $0.01 per 1M input / output / cache-read tokens), also on Baseten, Kilo, Prime Intellect, and ZML.

Qwen3.8-Max-Preview

Alibaba (Qwen)FrontierProprietary

Alibaba's largest model to date and the flagship of the new Qwen3.8 line — a 2.4-trillion-parameter, fully multimodal model that Alibaba positions just behind Anthropic's Fable 5 on overall performance (vendor internal evals; no independent third-party benchmarks yet). Launched Jul 19, 2026 as Qwen3.8-Max-Preview, available to developers via Alibaba's Token Plan subscription and the Qoder / QoderWork coding platforms. Architecture is presumed sparse-MoE; active-parameter count, context window, and pricing are undisclosed. Breaking from the API-only pattern of earlier Max models, the Qwen team says it will release open weights "soon," though no timeline, license, or full specs have been announced.

MoE2.4T— ctxJul 19, 2026

Kimi K3

MoE2.8T1.0M ctxJul 16, 2026

Moonshot's flagship open-weight agentic model and the largest open model released to date: a 2.8T-parameter MoE (896 experts, 16 active per token) using Kimi Delta Attention and Attention Residuals, with native multimodal input and a 1M-token context. Launched via API on Jul 16, 2026 at $3/$15 per Mtok (cached input $0.30); full open weights published to Hugging Face on Jul 26, 2026 — a day ahead of the announced Jul 27 target — under a Modified MIT license, making it freely downloadable and self-hostable.

Inkling

Thinking Machines LabFrontierOpen source

Thinking Machines Lab's first model and the leading U.S. open-weights release: a natively multimodal Mixture-of-Experts with 975B total / 41B active parameters that reasons across text, image, and audio inputs and emits text. Pretrained on ~45T tokens; served with a 1M-token context from the Hugging Face weights (256K on the hosted Tinker API). Apache-2.0 licensed (BF16 + NVFP4 checkpoints on Hugging Face), built for developers fine-tuning on proprietary data — coding assistants, agents/tool use, chatbots, and RAG — with an explicit low-cost and censorship-resistance focus. Debuted at 41 on the Artificial Analysis Intelligence Index. Hosted pricing (256K) $3.74/$9.36 per Mtok reflects a limited-time 50% launch discount.

MoE975B1.0M ctxJul 15, 2026

KAT-Coder-Air V2.5

Kwaipilot (Kuaishou)Proprietary

The efficient, ~32B variant of Kwaipilot's KAT-Coder V2.5, optimized through multi-stage training (supervised fine-tuning plus reinforcement learning). Shares KAT-Coder-Pro's 256K-token context and 80K max output, with function calling and tool use for agentic coding, at roughly a fifth of Pro's cost: $0.15 / $0.60 per million input/output tokens. Surfaced on release trackers on July 14 2026.

MoEUndisc.262K ctxJul 14, 2026

KAT-Coder-Pro V2.5

Kwaipilot (Kuaishou)Proprietary

Kwaipilot's flagship agentic coding model, from the KAT (Kwaipilot Agentic Tuning) series at Kuaishou. A Mixture-of-Experts model with ~72B active parameters, trained through large-scale agentic reinforcement learning in reconstructed, verifiable repository environments. Supports function calling, tool use, structured JSON output, and prompt caching, with a 256K-token context and up to 80K output tokens. Served via API (StreamLake / Atlas Cloud / OpenRouter) at $0.74 / $2.96 per million input/output tokens. Surfaced on release trackers on July 14 2026.

MoEUndisc.262K ctxJul 14, 2026

GPT-5.6

MoEUndisc.1.5M ctxJul 9, 2026

OpenAI's GPT-5.6 series umbrella row. Officially previewed June 26, 2026 as three durable capability tiers — Sol (flagship), Terra (balanced, for everyday work), and Luna (fast and affordable) — introduced with a new `max` reasoning effort for deeper reasoning and an `ultra` mode that leverages subagents to accelerate complex work. In the GPT-5.6 naming system the number marks the generation while Sol/Terra/Luna are tiers that can advance on their own cadence. Initially a limited preview via the API and Codex for a small group of vetted partners (after U.S. government review), with general availability across ChatGPT, Codex, and the API planned in the following weeks.

GPT-5.6 Terra

—Undisc.1.5M ctxJul 9, 2026

Balanced, everyday-work tier of OpenAI's GPT-5.6 series, officially previewed June 26, 2026. OpenAI positions Terra as competitive with GPT-5.5 while being roughly 2x cheaper. Shares the series' new `max` reasoning effort and `ultra` subagent mode and OpenAI's GPT-5.6 safety stack. Begins as a limited preview via the API and Codex for vetted partners after U.S. government review, with broad availability planned in the following weeks. Priced at $2.50 / $15 per million input/output tokens.

GPT-5.6 Sol

—Undisc.1.5M ctxJul 9, 2026

Flagship tier of OpenAI's GPT-5.6 series, officially previewed June 26, 2026 — OpenAI's strongest model to date. Adds a new `max` reasoning effort for the deepest reasoning and an `ultra` mode that uses subagents to accelerate complex work. Sets a new state of the art on Terminal-Bench 2.1 (command-line, agentic coding) and shows broad gains in long-horizon biology (GeneBench v1) and cybersecurity (ExploitBench, ExploitGym), paired with OpenAI's most robust safety stack and a phased release. Begins as a limited preview via the API and Codex for a small group of vetted partners after U.S. government review, with general availability planned in the following weeks; also launching on Cerebras at up to 750 tokens/sec in July. Priced at $5 / $30 per million input/output tokens.

GPT-5.6 Luna

—Undisc.1.5M ctxJul 9, 2026

Fast, low-cost tier of OpenAI's GPT-5.6 series, officially previewed June 26, 2026 — the most affordable model in the family, bringing strong capability at OpenAI's lowest cost. Shares the series' capability-tier naming and GPT-5.6 safety stack. Begins as a limited preview via the API and Codex for vetted partners after U.S. government review, with broader availability planned in the following weeks. Priced at $1 / $6 per million input/output tokens.

Muse Spark 1.1

Meta AIFrontierProprietary

Meta Superintelligence Labs' first paid model, released July 9, 2026 in US public preview on the Meta Model API. A natively multimodal reasoning model (text, image, video, PDF, and audio input; text output) with explicit chain-of-thought reasoning and a 1M-token context window that the model actively compacts. Positioned for agentic work and coding — tool use, multi-step workflow coordination, and long-horizon autonomous tasks — and pitched by Meta as roughly a quarter of the price of comparable Anthropic and OpenAI models at $1.25 in / $4.25 out per Mtok (with $20 in free credits per new API account). Marks the first time Meta has charged businesses for one of its models, a departure from the open-weight Llama strategy. Closed weights, undisclosed size. Vendor and third-party benchmarks place it around the Opus 4.8 / GPT-5.5 tier — strongest as an agent/workflow model and in tool-augmented reasoning, competitive but not dominant on coding and multimodal tasks.

—Undisc.1.0M ctxJul 9, 2026

Grok 4.5

MoEUndisc.— ctxJul 8, 2026

xAIFrontierProprietary

SpaceXAI's "Opus-class" agentic flagship, released July 8, 2026 — the first Grok model trained jointly with the coding startup Cursor (Anysphere), on trillions of tokens of real Cursor usage data plus STEM tasks, research papers, and other knowledge work. Targets software engineering, agentic tasks, and knowledge work, with explicit strength in legal and finance use cases (SpaceXAI claims the top spot on the Harvey Legal Agent Benchmark). A Mixture-of-Experts model reported to be built on a ~1.5-trillion-parameter "V9" foundation. Vendor benchmarks are mixed against Anthropic's Opus 4.8 — ahead on DeepSWE 1.0 and Terminal-Bench 2.1, behind on DeepSWE 1.1 and SWE-bench Pro — but markedly more token-efficient (~15,900 output tokens on SWE-bench Pro tasks, ~4.2x fewer than Opus 4.8) and served at ~80 tokens/sec. Base pricing $2/$6 per Mtok; Cursor lists a faster variant at $4/$18. Available in Grok Build (default model), Cursor (all plans), and the SpaceXAI console; initially unavailable in the EU (expected mid-July). xAI was absorbed by SpaceX in Feb 2026 and rebranded SpaceXAI.

Hunyuan Hy3

Tencent HunyuanFrontierOpen source

The general-availability release of Tencent's third-generation Hunyuan (Hunyuan 3.0), officially launched and open-sourced on July 6, 2026 after April's "Hy3 preview". A 295B-total / 21B-active Transformer MoE with an additional 3.8B multi-token-prediction (MTP) layer and a 256K-token context, offering three selectable inference modes that blend fast and slow thinking. Positioned as a leading open model for its size and cost efficiency, with standout results in coding, search, and scientific reasoning: Tencent reports it rivals flagship open models such as GLM-5.2 and DeepSeek-V4 (at 2-5x the active parameters) and matches or surpasses GPT-5.5 on several science benchmarks. Vendor-reported scores include 78.0 on SWE-bench Verified and 57.9 on SWE-bench Pro. Now Apache-2.0 licensed (the preview used Tencent's community license), with weights on Hugging Face (tencent/Hy3) and ModelScope and a free two-week API route on OpenRouter (tencent/hy3:free) through July 21, 2026. Deeply integrated into WeChat and Tencent's core products.

MoE295B256K ctxJul 6, 2026

Nemotron-Labs-3-Puzzle-75B-A9B

Hybrid75.3B1M ctxJul 6, 2026

A deployment-optimized open-weight model from NVIDIA, released July 6, 2026 — a compressed variant of Nemotron-3-Super-120B-A12B produced with "Iterative Puzzle", a post-training compression framework that jointly prunes MoE experts, active-parameter budget, and Mamba state to boost inference efficiency while preserving accuracy. Reduces the parent from 120.7B total / 12.8B active to 75.3B total / 9.3B active, keeping the hybrid Mamba-Transformer LatentMoE architecture with Multi-Token Prediction. Delivers ~2x higher server throughput than Nemotron-3-Super on a single 8xB200 node at matched user throughput and raises sustainable 1M-token single-H100 concurrency from 1 to 8 requests. Targets collaborative agents, chatbots, RAG, complex instruction-following, and long-context reasoning across English, code, and six other languages. Shipped in BF16, FP8, and NVFP4 variants under the OpenMDW-1.1 license.

Mistral frontier open-weight MoE (unnamed)

MoEUndisc.— ctxJul 6, 2026

Rumored / early-access: Mistral AI has confirmed a new open-weight Mixture-of-Experts family — described by CEO Arthur Mensch as "fat but sparse" — aimed at closing the gap with frontier open-weight releases, with early access beginning in July 2026. Mensch confirmed the intent but disclosed almost nothing concrete: no parameter count, no benchmarks, no license terms, and no release date. Tracked as rumored until weights or an official product page land.

Laguna XS 2.1

MoE33B256K ctxJul 2, 2026

PoolsideOpen source

Poolside's open-weight small coding model: a 33B-total / 3B-active Mixture-of-Experts built for agentic coding and long-horizon work on a local machine, served at 256K context. An upgraded XS.2 (same architecture) that lifts SWE-bench Multilingual by 5.4 points to 63.1% and improves terminal-style tasks. Ships with open-weighted DFlash speculator (draft) models for each checkpoint that roughly double local tokens/sec, plus BF16/FP8/INT4/NVFP4 quantized checkpoints; supported in vLLM, SGLang, TensorRT-LLM, HF transformers, and Ollama (llama.cpp coming). Newly relicensed under the fully permissive OpenMDW-1.1. Available free on Hugging Face and via a free OpenRouter tier, with paid API pricing of $0.10 / $0.20 / $0.05 per 1M input / output / cache-read tokens. Its predecessor Laguna XS.2 sunsets on Poolside's API one week after launch.

Claude Sonnet 5

—Undisc.200K ctxJun 30, 2026

Anthropic's most agentic Sonnet model yet, with performance approaching Claude Opus 4.8 at a lower price. Built for coding, tool use (browsers and terminals), and autonomous multi-step agentic work, with selectable effort levels up to xhigh. A substantial upgrade over its predecessor Sonnet 4.6 on reasoning, tool use, coding, and knowledge work, with gains shown on SWE-bench, OSWorld-Verified, BrowseComp, and Humanity's Last Exam. The default model on the Free and Pro plans and available to Max, Team, and Enterprise users; in Claude Code and via the Claude API as claude-sonnet-5. Uses an updated tokenizer (same approach as Opus 4.7). Ships with real-time cyber safeguards enabled by default. Introductory API pricing of $2/$10 per Mtok through Aug 31, 2026, then standard $3/$15.

LongCat-2.0

Meituan (LongCat)FrontierOpen source

Meituan's open-weight flagship: a 1.6-trillion-parameter Mixture-of-Experts model (~48B active per token, dynamically routed between ~33B and ~56B) with a 1M-token context, built for agentic coding. Notable as the largest Chinese model trained — for both pre-training and inference — entirely on a ~50,000-card cluster of domestic Chinese AI chips (Meituan's use of the Huawei Collective Communication Library points to Huawei Ascend hardware), and the first trillion-parameter model Meituan claims completed full-process training on home-grown compute. Vendor-reported software-engineering results: 59.5 on SWE-bench Pro (ahead of GPT-5.5's 58.6), 70.8 on Terminal-Bench 2.1, and 77.3 on SWE-bench Multilingual, with overall quality positioned as comparable to Gemini 3.1 Pro (self-reported, not yet independently verified). Open-sourced under the MIT license with weights on Hugging Face and GitHub; follows LongCat-Flash (560B, Sep 2025) and the multimodal LongCat-Next (Mar 2026).

MoE1.6T1M ctxJun 30, 2026

Base1

—Undisc.— ctxJun 29, 2026

Base44Proprietary

Base44's first in-house model, a general-purpose agent for 'vibe coding' — turning natural-language prompts into working web apps. Fine-tuned on top of an open-source foundation model (rather than trained from scratch) and specialized on a dataset generated from tens of millions of real interactions across the Base44 platform, it holds a conversation, writes code, and handles multi-turn requests, tool use, and backend operations while being faster and cheaper to run than the frontier models it sits beside. Now selectable in Base44's model picker alongside GPT-5.5 and Claude Opus 4.8. Rolled out from June 29 2026; platform-only, with no separately published API pricing or context figure. Base44 (founded by Maor Shlomo) was acquired by Wix in 2025.

Seed 2.1 Turbo

—Undisc.256K ctxJun 23, 2026

The low-cost, low-latency tier of ByteDance's Seed 2.1 family (served as Doubao-Seed-2.1-turbo), built for large-scale production with full features and performance ByteDance positions as comparable to Seed 2.1 Pro. Shares the family's coding, long-chain agent, and multimodal-understanding focus and 256K-token context, priced for high-volume online calls. Proprietary; available via Doubao and Volcano Engine. List price ¥3 / ¥15 per million input/output tokens — roughly half the Pro tier.

Seed 2.1 Pro

ByteDance SeedFrontierProprietary

ByteDance's flagship next-generation agent model (served as Doubao-Seed-2.1-pro), built for the "coding and agent era." A deep-thinking model tuned for strong demand understanding, long-horizon planning, and continuous self-repair across complex coding, long-chain agents, and multi-step engineering delivery, with a 256K-token context. ByteDance reports its core coding, agent, and multimodal capabilities are comparable to GPT-5.5, with the highest score on GDPVal, top-tier results on the Agents' Last Exam, the highest score on MobileWorld, and SOTA results across several visual and video-understanding benchmarks (CharXiv-RQ, MeasureBench, TVBench, TOMATO). Proprietary; available via Doubao and Volcano Engine. List price ¥6 / ¥30 per million input/output tokens.

—Undisc.256K ctxJun 23, 2026

Fugu Ultra

Sakana AIFrontierProprietary

Sakana AI's frontier-class orchestration model: a single ~7B language model trained to coordinate a swappable pool of external frontier LLMs (model selection, delegation, verification, and synthesis happen internally), exposed behind one OpenAI-compatible API. The Ultra tier is tuned for maximum answer quality on hard, multi-step problems and coordinates a deeper, fixed pool of expert agents. Sakana reports it stands shoulder-to-shoulder with Anthropic's Fable 5 and Mythos Preview across coding, science, and reasoning benchmarks while routing around single-vendor/export-control risk. Current model ID fugu-ultra-20260615; vendor-reported scores include 95.5 GPQA-D, 73.7 SWE-Bench Pro, 93.2 LiveCodeBench, and 50.0 Humanity's Last Exam.

—Undisc.— ctxJun 22, 2026

Fugu

—Undisc.— ctxJun 22, 2026

Sakana AIProprietary

Sakana AI's orchestration model: a single language model that delivers a full multi-agent system behind one OpenAI-compatible API, dynamically routing tasks across a swappable pool of frontier LLMs (including recursive calls to itself). The base tier balances strong performance with low latency as an everyday default for coding, code review, and chat, and lets teams opt specific agents out of the pool for data/privacy/compliance needs. Released alongside Fugu Ultra on June 22, 2026; vendor-reported scores include 95.5 GPQA-D, 92.9 LiveCodeBench, and category-leading SciCode and long-context results.

Z.ai Fable-class model

Z.ai (Zhipu AI)FrontierProprietary

A speculative Z.ai frontier model tracked after Z.ai founder Jie Tang responded to Elon Musk's prediction of a Chinese Fable 5-class model by saying it would not take that long. Name, architecture, weights, and launch timing remain unconfirmed.

—Undisc.— ctxJun 19, 2026

Kimi K2.7 Code

MoE1T262K ctxJun 18, 2026

Moonshot's open coding-focused agentic model built on K2.6, with native vision/video input, forced thinking mode, and stronger long-horizon software-engineering performance.

GLM-5.2

MoE753B1M ctxJun 17, 2026

Z.ai's latest open flagship for long-horizon coding, agentic engineering, and million-token workflows, adding IndexShare sparse-attention reuse over GLM-5.1.

MiniMax-M3

MiniMaxFrontierOpen weights

Native multimodal MiniMax model with a one-million-token context, sparse attention, and agentic coding/cowork positioning.

MoE428B1M ctxJun 16, 2026

DiffusionGemma 26B-A4B

MoE25.2B256K ctxJun 10, 2026

An open-weight text-diffusion model built on the Gemma 4 26B-A4B MoE backbone (25.2B total / 3.8B active). Denoises text in parallel 256-token blocks for up to ~4x faster generation (1,000+ tok/s on an H100), with a 256K context and text, image, and video input. Apache-2.0.

Claude Fable 5

—Undisc.1M ctxJun 9, 2026

The public, guardrailed sibling of Claude Mythos 5 and Anthropic's most capable widely released model, built for long-horizon agentic work, coding, vision, and knowledge workflows. Launched June 9, 2026 across the Claude API, AWS, and Microsoft Foundry, then suspended three days later under a U.S. export-control directive. Anthropic says those controls were lifted June 30 and Fable 5 was restored globally on July 1 across Claude Platform, Claude.ai, Claude Code, and Claude Cowork, with cloud partner access being re-enabled. Its safeguards route flagged cybersecurity, biology/chemistry, and distillation requests to Claude Opus 4.8.

North Mini Code 1.0

MoE30B256K ctxJun 9, 2026

CohereOpen source

Cohere's first developer-focused model and the first in its North family of code agents. A 30B-total / 3B-active MoE for agentic coding with a 256K context and up to 64K output, sized to run locally for enterprise coding agents. Apache-2.0.

Unisound U2

UnisoundProprietary

Unisound's new-generation, general-purpose "native agentic" large model, built for task execution: it can autonomously decompose and advance complex real-world workflows of 100+ steps rather than single-turn Q&A. Unisound frames it around "high intelligence density x high token value" and reports ~25% lower thinking-token consumption. Available via the Unisound Token Hub.

—Undisc.— ctxJun 7, 2026

Nemotron 3 Ultra 550B-A55B

NVIDIAFrontierOpen weights

NVIDIA's largest Nemotron 3 open-weight hybrid Mamba-Transformer MoE, tuned for agentic reasoning, coding, planning, and tool calling.

Hybrid550B1M ctxJun 4, 2026

Qwen3.7-Plus

Alibaba (Qwen)Proprietary

Multimodal sibling of Qwen3.7-Max that adds vision input and GUI grounding for screen perception, browser automation, and hybrid GUI+CLI agent workflows. 1M-token context; closed-weights and API-only. Previewed at the May 2026 Alibaba Cloud Summit and reached general availability in June 2026 at a low price point ($0.40/$1.60 per 1M tokens).

MoEUndisc.1M ctxJun 3, 2026

Gemma 4 12B

Dense12B256K ctxJun 3, 2026

A dense 12B member of the Gemma 4 family with a unified, encoder-free multimodal architecture: vision and audio are projected straight into the LLM backbone. First medium-size Gemma to natively ingest audio; runs on a 16GB laptop. 256K context, Apache-2.0.

MAI-Thinking-1

MicrosoftFrontierProprietary

Microsoft's first in-house frontier reasoning model, unveiled at Build 2026. A sparse MoE (~35B active, 256K context) trained entirely on commercially licensed data without third-party distillation. Microsoft reports 97.0% on AIME 2025 and coding parity with Claude Opus 4.6 on SWE-bench Pro.

MoEUndisc.256K ctxJun 2, 2026

MAI-Code-1-Flash

MicrosoftProprietary

An inference-efficient agentic coding model from Microsoft (~5B active parameters), trained from the ground up on clean, traceable, enterprise-grade data without third-party distillation. Rolling out in GitHub Copilot; Microsoft reports a +16-point SWE-bench Pro lead over Claude Haiku 4.5.

—Undisc.— ctxJun 2, 2026

Nex-N2-Pro

MoE397B262K ctxJun 2, 2026

Nex AGIOpen source

Nex AGI's open-weight agentic flagship, post-trained on Qwen3.5-397B-A17B (397B total / ~17B active MoE) by the Shanghai Innovation Institute-led Nex alliance. Built around an "Agentic Thinking" framework for long-horizon coding, deep research, tool calling, and terminal execution; accepts text and image input and emits text with explicit reasoning traces and function calling. Apache-2.0, ~262K context. Nex reports parity with GPT-5.5 and Claude Opus 4.7 on several agentic and coding evals. A smaller Nex-N2-mini (35B/3B-active) was announced but is not yet open-sourced.

Step-3.7-Flash

MoE196B256K ctxMay 29, 2026

StepFunOpen source

StepFun's high-efficiency multimodal sparse-MoE successor to Step-3.5-Flash: a ~196B-total / ~11B-active vision-language model with native image and video understanding, a 256K context, and selectable reasoning tiers (high/medium/low). Tuned for coding agents and search workflows.

Claude Opus 4.8

—Undisc.500K ctxMay 28, 2026

Anthropic's most capable model, with strengthened agentic and long-running task performance.

LFM2.5-8B-A1B

MoE8.3B131K ctxMay 28, 2026

Liquid AIOpen weights

Liquid AI's on-device Mixture-of-Experts model: 8.3B total parameters with only ~1.5B active per forward pass (32 experts, 4 active per token). Uses Liquid's hybrid architecture — 18 double-gated LIV convolution blocks plus 6 grouped-query-attention layers — for a 131K-token context that runs in under ~6GB of memory on consumer hardware. A reasoning-only model that emits an explicit chain of thought before its answer, with strong tool-calling and agentic performance for its size. Builds on the October 2025 LFM2-8B-A1B, expanding the context window to 128K and scaling pretraining from 12T to 38T tokens. Released May 28 2026 under the LFM Open License; caught in a July catalog-gap sweep.

MiniMax-M2.7

MiniMaxFrontierOpen weights

Open-weight agentic model from MiniMax focused on real-world software engineering, office tasks, tool use, and self-improving training workflows.

MoE229.9B— ctxMay 26, 2026

Qwen3.7-Max

Alibaba (Qwen)FrontierProprietary

Alibaba's proprietary flagship in the Qwen3.7 "Agent Frontier" line — a text-only sparse-MoE model with a 1M-token context, tuned for long-horizon agentic, coding, and reasoning workloads. Parameter count is undisclosed; access is API-only via Alibaba Cloud Model Studio / DashScope (and aggregators such as OpenRouter).

MoEUndisc.1M ctxMay 20, 2026

Gemini 3.5 Pro

Google DeepMindFrontierProprietary

Announced at Google I/O 2026; emphasizes deep multimodal reasoning over a 2M-token context. Recent reporting says the broad launch slipped from June toward July while testers continue using it in Google Antigravity and LMArena.

MoEUndisc.2M ctxMay 19, 2026

Gemini 3.5 Flash

—Undisc.1M ctxMay 19, 2026

Google's fast, cost-efficient Gemini 3.5 tier, unveiled at I/O 2026. Multimodal over a 1M-token context and tuned for agentic and coding workflows; Google says it beats Gemini 3.1 Pro on coding and tool-use while running ~4x faster.

Qwen3.6-27B

Dense27B256K ctxMay 12, 2026

Dense 27B that punches far above its weight on agentic coding — easy to self-host on a single GPU node.

ERNIE 5.1

MoEUndisc.— ctxMay 8, 2026

BaiduFrontierProprietary

Baidu's flagship ERNIE 5.1, derived from ERNIE 5.0 by extracting an optimal sub-network from its elastic sub-model matrix — compressing total parameters to ~1/3 and active parameters to ~1/2 of ERNIE 5.0 while reaching leading performance at only ~6% of the pre-training compute of comparable models. A Mixture-of-Experts model trained with a disaggregated fully-asynchronous RL stack and a multi-teacher on-policy-distillation pipeline, tuned for agentic execution, reasoning, world knowledge, and creative writing. Vendor-reported results: 99.6 on AIME26 (with tools, second only to Gemini 3.1 Pro), GPQA and MMLU-Pro approaching leading closed models, surpassing DeepSeek-V4-Pro on τ³-bench and SpreadsheetBench-Verified, and ranking 1st among Chinese models / 4th globally (score 1223) on the LMArena Search Arena. Proprietary; served via ERNIE Bot, Baidu AI Studio, and the Qianfan platform.

GPT-5.5-Cyber

OpenAI's limited-preview cybersecurity model for vetted defenders in Trusted Access for Cyber. It is tuned for more permissive authorized security workflows such as vulnerability triage, patch validation, malware analysis, red teaming, and controlled exploit validation, but is not generally available.

—Undisc.— ctxMay 7, 2026

GPT-5.5

—Undisc.800K ctxMay 7, 2026

OpenAI's May 2026 GPT-5.5 release: a stronger frontier workhorse positioned for deep reasoning, coding, multimodal analysis, and long-context agent workflows. OpenAI lists an 800K-token input context and 128K-token output limit, with API pricing at $3 / $20 per million input/output tokens.

Grok 4.3

MoEUndisc.1M ctxMay 6, 2026

xAIFrontierProprietary

xAI's agentic flagship with a 1M-token context and aggressive API pricing.

DeepSeek V4-Flash

MoE284B1M ctxApr 24, 2026

Efficient V4 companion model with 284B total / 13B active parameters and the same one-million-token context window.

DeepSeek V4-Pro

MoE1.6T1M ctxApr 24, 2026

Preview-series sparse MoE flagship with a one-million-token context window and 1.6T total / 49B active parameters.

Hunyuan Hy3-preview

Tencent HunyuanFrontierOpen weights

Tencent's third-generation Hunyuan, rebuilt from scratch in ~90 days and open-sourced as the "Hy3 preview". A 295B-total / 21B-active Transformer MoE (80 layers, 192 experts with top-8 routing, plus a 3.8B multi-token-prediction layer) with a 256K-token context, positioned as a leading open reasoning-and-agent model for its size with strong cost efficiency. Vendor-reported results: 74.4 on SWE-bench Verified, 54.4 on Terminal-Bench 2.0, and 70.2 on WideSearch, with strong STEM-olympiad performance. Open weights on GitHub and Hugging Face under Tencent's community license.

MoE295B256K ctxApr 23, 2026

Hunyuan-A13B-Instruct

Tencent HunyuanOpen weights

Tencent Hunyuan open-weight fine-grained MoE model with 80B total parameters and 13B active parameters, optimized for agentic tool use.

MoE80B— ctxApr 22, 2026

MiMo-V2.5-Pro

Xiaomi (MiMo)FrontierOpen source

Xiaomi's open-weight flagship: a 1.02T-parameter Mixture-of-Experts model with ~42B active parameters, a hybrid-attention architecture, and a 1M-token context window. Tuned for frontier-class agentic coding and long-horizon tasks (sustaining 1000+ tool calls with a proper harness). Open-sourced under the MIT license with weights and tokenizer on Hugging Face.

MoE1.0T1M ctxApr 22, 2026

MiMo-V2.5

MoE310B1M ctxApr 22, 2026

Xiaomi (MiMo)Open source

Xiaomi's open-weight sparse-MoE model: ~310B total parameters with ~15B active, trained on ~48T tokens, with a 1M-token context window. Shipped alongside the larger MiMo-V2.5-Pro under the MIT license.

GPT-Rosalind

—Undisc.— ctxApr 16, 2026

OpenAI's frontier reasoning model for life sciences, named after Rosalind Franklin and built to accelerate drug discovery, genomics, protein reasoning, and scientific research workflows. Optimized for multi-step, tool-heavy tasks (literature review, experimental design, sequence-to-function interpretation) with access to 50+ scientific databases via a Codex Life Sciences plugin. A June 3, 2026 update folded in GPT-5.5's agentic coding and tool use while using ~31% fewer tokens. Available as a research preview in ChatGPT, Codex, and the API through OpenAI's trusted-access program; not openly available.

GLM-5.1

Z.ai agentic-engineering follow-up to GLM-5, with stronger coding performance and better long-horizon tool-use behavior.

MoE754B— ctxApr 8, 2026

Muse Spark

Meta AIFrontierProprietary

Meta's new frontier model behind Meta AI for U.S. users, identified in reporting as the public release of the former Avocado effort and positioned to compete with Gemini, GPT, and Claude on multimodal assistant tasks.

—Undisc.— ctxApr 8, 2026

Claude Mythos 5

The restricted sibling of Claude Fable 5, sharing the same underlying Mythos-class model with fewer safeguards for vetted defensive-security and research use. Anthropic disclosed Mythos Preview on April 7, 2026, upgraded approved Project Glasswing users to Mythos 5 on June 9, suspended access under a U.S. export-control directive on June 12, and restored limited access for approved U.S. organizations on July 1 while it continues expanding the trusted-access program.

—Undisc.— ctxApr 7, 2026

Gemma 4 31B

Google DeepMind's Gemma 4 advanced-reasoning open model for personal computers, part of the April 2026 Gemma 4 family.

Dense31B— ctxApr 2, 2026

GLM-5V-Turbo

Z.ai (Zhipu AI)Proprietary

Z.ai's native-multimodal vision agent: the first GLM model designed from the start as a multimodal agent, taking image, video, and text input and producing agent-oriented output (tool calling, task decomposition, and GUI interaction). Served via API with a ~203K-token context.

—Undisc.203K ctxApr 1, 2026

Kimi K2.6

MoE1T256K ctxMar 30, 2026

Moonshot's open native multimodal agentic model for long-horizon coding, visual interface generation, and autonomous tool orchestration.

Mistral Medium 3.5

Dense128B256K ctxMar 18, 2026

Dense 128B open-weight model with a 256k context and strong coding performance for its size.

Nemotron 3 Super 120B-A12B

NVIDIAFrontierOpen weights

Open-weight hybrid Mamba-Transformer MoE designed for collaborative agents and high-volume enterprise workflows.

Hybrid120B1M ctxMar 16, 2026

Mistral Small 4

MoE119B256K ctxMar 16, 2026

Mistral's March 2026 Small release: the first Mistral model to unify reasoning (Magistral), multimodal understanding (Pixtral), and agentic coding (Devstral) into one Apache 2.0 model. A 119B-total / ~6B-active Mixture-of-Experts (128 experts, 4 active per token) with native text+image input, a 256K context, and a configurable reasoning_effort toggle for fast or deep responses. API pricing is $0.15 / $0.60 per million input/output tokens.

Step-3.5-Flash

MoE196B256K ctxMar 14, 2026

StepFunOpen source

StepFun's Apache-licensed sparse MoE model for fast agentic execution, coding, math, browsing, and tool-use workflows.

Meta Avocado

Meta AIFrontierProprietary

Historical rumor/codename row retained for provenance. Recent reporting identifies the public productized model as Muse Spark, now tracked separately in the catalog.

—Undisc.— ctxMar 13, 2026

Sarvam-105B

MoE105B128K ctxMar 6, 2026

Sarvam AIOpen source

Apache-licensed Indian-context MoE from Sarvam AI, optimized for reasoning, coding, agentic tasks, and 22 Indian languages.

GPT-5.4

MoEUndisc.400K ctxMar 5, 2026

Workhorse GPT-5 release with a dedicated Thinking mode; widely deployed across ChatGPT and the API.

Qwen3.5-9B

Dense9B262K ctxMar 2, 2026

The flagship of Alibaba's small dense Qwen3.5 models. Independent analysis (Artificial Analysis) rated it the most intelligent model under 10B parameters at launch — roughly double the score of the next-closest sub-10B models — and the most intelligent multimodal model under 15B, leading peers on MMMU-Pro (~69%). A dense 9B with native vision, a 262K-token context, and the Qwen3.5 family's unified hybrid thinking / non-thinking mode. Native weights are BF16; in 4-bit it needs ~6GB, within reach of consumer laptops. High intelligence comes with heavy reasoning token usage (~260M output tokens to run the Intelligence Index).

Qwen3.5-4B

Dense4B262K ctxMar 2, 2026

A dense 4B in Alibaba's small Qwen3.5 family, rated by Artificial Analysis as the most intelligent model under 5B parameters at launch — outscoring several 7B–9B peers despite roughly half the parameters. Native vision, a 262K-token context, and the family's hybrid thinking / non-thinking mode; Apache-2.0 licensed. Scores ~65% on MMMU-Pro multimodal reasoning and runs in ~3GB at 4-bit, suitable for lightweight on-device agents.

Qwen3.5-2B

Dense2B262K ctxMar 2, 2026

A dense 2B Qwen3.5 model built for high-throughput, low-latency edge and on-device use. Despite its size it matches a 7B-class peer on Artificial Analysis's Intelligence Index. Apache-2.0, with native vision, a 262K-token context, and the family's hybrid thinking / non-thinking mode; runs in under 2GB at 4-bit, fitting laptops and smartphones.

Qwen3.5-0.8B

Dense0.8B262K ctxMar 2, 2026

The smallest Qwen3.5 model — a dense 0.8B designed for the most constrained on-device deployments, operating in non-thinking (instruct) mode by default. Apache-2.0, with native vision, a 262K-token context, and the family's hybrid thinking / non-thinking mode; needs roughly 2GB of VRAM and runs under 2GB at 4-bit, targeting smartphones and embedded hardware. Notable for a sub-1B model, it still scores ~26% on MMMU-Pro multimodal reasoning.

Qwen3.5-397B

Alibaba (Qwen)FrontierOpen source

Native vision-language MoE supporting 201 languages with a 1M-token context.

MoE397B1M ctxFeb 20, 2026

Gemini 3.1 Pro

Google DeepMindFrontierProprietary

Generally available multimodal flagship with native tool use and a 2M-token context.

MoEUndisc.2M ctxFeb 19, 2026

GLM-5

Z.ai flagship for complex systems engineering and long-horizon agentic tasks, scaling the GLM line to 744B total / 40B active parameters.

MoE744B— ctxFeb 11, 2026

Claude Opus 4.6

—Undisc.200K ctxFeb 5, 2026

Introduced genuinely autonomous multi-file coding and stronger computer use.

GPT-5.3-Codex

—Undisc.400K ctxFeb 5, 2026

OpenAI's February 2026 Codex update, optimized for agentic software engineering in ChatGPT, Codex, and the API. GPT-5.3-Codex improves code quality, patch reliability, repository-scale reasoning, and long-running autonomous coding workflows while keeping the 400K-token input context and 128K-token output limit of the Codex line.

Qwen3-Coder-Next

Hybrid80B262K ctxFeb 3, 2026

Apache-licensed Qwen3-Next coding-agent model with 80B total / 3B active parameters, 256K context, and long-horizon tool-use training.

Kimi K2.5

MoE1T256K ctxJan 27, 2026

Open multimodal Kimi model that adds native visual agentic intelligence, instant and thinking modes, and agent-swarm workflows on top of the K2 base.

GLM-4.7

Coding-focused GLM release with improved multilingual agentic coding, terminal tasks, tool use, and interface generation.

MoE358B— ctxJan 8, 2026

GPT-5.2-Codex

—Undisc.400K ctxDec 18, 2025

OpenAI's December 2025 Codex model for agentic coding, released after GPT-5.2 with stronger repository understanding, code generation, and tool-use behavior for software engineering agents. The API model is listed with a 400K-token input context, 128K-token output limit, and $1.25 / $10 per million input/output tokens.

OLMo 3 Think 32B

Dense32B— ctxDec 15, 2025

Ai2's fully open thinking model with public weights, code, data, checkpoints, and training details across the OLMo 3 pipeline.

Nemotron 3 Nano 30B-A3B

Hybrid30B1M ctxDec 15, 2025

Efficient Nemotron 3 MoE checkpoint for agentic reasoning and coding, activating about 3B parameters while supporting 1M-token contexts.

GPT-5.2

—Undisc.400K ctxDec 11, 2025

OpenAI's December 2025 GPT-5.2 general model release, positioned as a stronger default for reasoning, coding, vision, instruction following, and long-context analysis. OpenAI lists a 400K-token input context, 128K-token output limit, and $2 / $12 per million input/output tokens.

GLM-4.6V

Z.ai (Zhipu AI)Open source

Open 106B-class vision-language model with native multimodal function calling for visual agents.

MoE106B128K ctxDec 8, 2025

Mistral Large 3

Mistral AIFrontierOpen weights

Mistral's largest open-weight MoE, aimed at frontier reasoning while remaining self-hostable.

MoE675B256K ctxDec 2, 2025

DeepSeek-V3.2

MoE685B128K ctxDec 1, 2025

Reasoning-first agent model that adds DeepSeek Sparse Attention and thinking directly inside tool-use workflows.

DeepSeek-V3.2-Speciale

MoE685B128K ctxDec 1, 2025

High-compute reasoning variant of V3.2, positioned for olympiad-level math, programming, and other deep reasoning tasks.

LFM2 1.2B

Hybrid1.17B33K ctxNov 28, 2025

Liquid AIOpen weights

Liquid AI hybrid model for efficient CPU/GPU/NPU local deployment, using short convolutions plus attention blocks.

Kimi K2 Thinking

Open K2 reasoning-agent variant that interleaves step-by-step thinking with tool calls and supports stable 200-300 step tool-use trajectories.

MoE1T256K ctxNov 6, 2025

Kimi-Linear-48B-A3B-Instruct

Hybrid48B1.0M ctxOct 31, 2025

MIT-licensed hybrid linear-attention model using Kimi Delta Attention, built for million-token contexts with much lower KV-cache usage.

Claude Haiku 4.5

—Undisc.200K ctxOct 15, 2025

Anthropic's fast, low-cost Claude 4.5 model, released in October 2025 for latency-sensitive coding, tool-use, and customer-facing agent workloads. Anthropic positions it as bringing near-Sonnet capability to the Haiku tier at substantially lower cost and higher speed.

GLM-4.6

MoE357B200K ctxSep 30, 2025

Agentic reasoning and coding upgrade over GLM-4.5, expanding the text context window from 128K to 200K tokens.

DeepSeek-V3.2-Exp

MoE685B128K ctxSep 29, 2025

Experimental checkpoint that introduced DeepSeek Sparse Attention as an efficiency bridge between V3.1-Terminus and V3.2.

Claude Sonnet 4.5

—Undisc.200K ctxSep 29, 2025

Anthropic's September 2025 Sonnet release, positioned as its strongest model for coding, agents, and computer-use workflows at launch. Proprietary API model with text, vision, and code capabilities, 200K context, and Sonnet-tier list pricing.

DeepSeek-V3.1-Terminus

MoE685B128K ctxSep 22, 2025

Stability update to V3.1 focused on language consistency, code-agent reliability, and search-agent behavior.

Kimi K2 Instruct 0905

September 2025 K2 update with stronger agentic coding, better frontend generation, and a doubled 256K context window.

MoE1T256K ctxSep 5, 2025

Gemma 3 27B

Dense27B128K ctxSep 4, 2025

Google's open multimodal model: 128k context, 140+ languages, runs on a single GPU.

DeepSeek-V3.1

MoE671B128K ctxAug 21, 2025

Hybrid thinking/non-thinking release that upgraded tool calling, long-context training, and agent task performance.

Seed-OSS-36B-Instruct

ByteDance SeedOpen source

ByteDance Seed's Apache-licensed long-context reasoning and agent model, with controllable thinking budgets and a native 512K context.

Dense36B512K ctxAug 20, 2025

DeepSeek R2

—Undisc.— ctxAug 14, 2025

Rumored successor to DeepSeek R1. Reports say development and launch timing were affected by hardware constraints around Huawei Ascend training and Nvidia availability; final specs, license, and release date remain unconfirmed.

GLM-4.5V

Z.ai (Zhipu AI)Open source

Vision-language GLM based on GLM-4.5-Air, covering image, video, document, grounding, and GUI-agent tasks.

MoE106B— ctxAug 11, 2025

Grok 5

xAIFrontierProprietary

Rumored next major Grok model. Elon Musk said after GPT-5's launch that Grok 5 would arrive before the end of 2025, but no broad public Grok 5 release is logged in this catalog yet.

—Undisc.— ctxAug 8, 2025

gpt-oss-20b

MoE21B128K ctxAug 5, 2025

OpenAIOpen source

Smaller gpt-oss reasoning model optimized for local inference on systems with about 16GB of memory.

gpt-oss-120b

MoE117B128K ctxAug 5, 2025

OpenAIOpen source

OpenAI's larger open-weight reasoning model, a 117B-total / 5.1B-active MoE with 128K context for local and self-hosted deployment.

Claude Opus 4.1

—Undisc.200K ctxAug 5, 2025

Anthropic's August 2025 Opus point release, focused on stronger coding, reasoning, and agentic reliability over Claude Opus 4. Proprietary API model with text, vision, and code capabilities.

Gemini 2.5 Deep Think

Google DeepMindFrontierProprietary

Google's enhanced Gemini 2.5 reasoning mode for harder math, science, coding, and multimodal analysis. Previewed at Google I/O 2025 and later made available to Gemini app subscribers, Deep Think uses more deliberative reasoning for complex prompts.

—Undisc.1M ctxAug 1, 2025

Falcon-H1 34B

Hybrid34B256K ctxJul 31, 2025

A hybrid attention + state-space-model (SSM) design that matches 70B-class models with fewer parameters.

GLM-4.5

MoE355B128K ctxJul 28, 2025

Open agentic, reasoning, and coding foundation model that marked Z.ai international rebrand and MIT-licensed GLM push.

GLM-4.5-Air

Z.ai (Zhipu AI)Open source

Compact GLM-4.5 companion with 106B total / 12B active parameters for efficient agentic reasoning and coding.

MoE106B128K ctxJul 28, 2025

Gemini 2.5 Flash-Lite

—Undisc.1M ctxJul 22, 2025

Google's lowest-latency, lowest-cost Gemini 2.5 tier, designed for summarization, classification, extraction, routing, and other high-volume production tasks. Proprietary API model with a 1M-token context and multimodal support.

Qwen3-Coder-480B-A35B-Instruct

MoE480B262K ctxJul 22, 2025

Alibaba Qwen's large open coding-agent model: a 480B-total / 35B-active MoE released under Apache-2.0, tuned for code generation, repository-level software engineering, tool calling, and long-horizon agent workflows with a 256K-token native context.

EXAONE 4.0 32B

LG AI ResearchOpen weights

LG AI Research's unified model with non-reasoning and reasoning modes, agentic tool use, and English, Korean, and Spanish support.

Dense32B— ctxJul 15, 2025

Kimi K2 Instruct

MoE1T128K ctxJul 11, 2025

Original open K2 post-trained model: a 1T-parameter MoE optimized for coding, reasoning, and tool-using agentic workflows.

Grok 4

xAI's fourth-generation Grok line, preceding the later 4.x API updates already tracked in the catalog.

—Undisc.— ctxJul 9, 2025

SmolLM3 3B

Dense3B128K ctxJul 8, 2025

Hugging FaceOpen source

Hugging Face's fully open 3B multilingual long-context model with optional reasoning mode and 128K context.

ERNIE-4.5-300B-A47B

MoE300B128K ctxJun 30, 2025

BaiduOpen source

Baidu's open ERNIE 4.5 language MoE, part of a 10-variant Apache-licensed model family built with heterogeneous multimodal MoE training.

ERNIE-4.5-VL-424B-A47B

MoE424B128K ctxJun 30, 2025

BaiduOpen source

Baidu's largest ERNIE 4.5 vision-language MoE, supporting text, image, and video inputs with thinking and non-thinking modes.

Kimi-VL-A3B-Thinking-2506

MoE16B128K ctxJun 21, 2025

Updated MIT-licensed Kimi-VL reasoning model with better multimodal reasoning, video understanding, high-resolution perception, and lower thinking-token use.

Kimi-Dev-72B

Dense73B— ctxJun 17, 2025

MIT-licensed coding LLM trained with repository-level reinforcement learning for software issue resolution.

Gemini 2.5 Flash

—Undisc.1M ctxJun 17, 2025

Google's faster, lower-cost Gemini 2.5 model for high-throughput multimodal and agentic workloads. It brought Gemini 2.5's reasoning improvements to a production Flash tier with a 1M-token context and broad text, image, audio, video, and coding support.

MiniMax-M1-80k

MiniMaxFrontierOpen source

Open Apache-licensed hybrid-attention reasoning model with 456B total / 45.9B active parameters and a native 1M-token context.

Hybrid456B1M ctxJun 16, 2025

Magistral Medium

—Undisc.— ctxJun 10, 2025

Mistral's first dedicated reasoning model family, released in Small open-weight and Medium enterprise/API tiers.

Magistral Small

Dense24B40K ctxJun 10, 2025

Open-weight 24B reasoning model from Mistral's Magistral family, popular for local reasoning experiments.

DeepSeek-R1-0528

MoE671B128K ctxMay 28, 2025

Major R1 reasoning update with stronger math, programming, general logic, function calling, and reduced hallucinations.

Claude Opus 4

—Undisc.200K ctxMay 22, 2025

First Claude 4 Opus model, positioned for long-running agentic and coding work before the 4.x point releases.

Seed Thinking v1.5

—Undisc.— ctxMay 22, 2025

ByteDance Seed reasoning model focused on long-horizon thinking and problem solving.

Sarvam-M

DenseUndisc.— ctxMay 21, 2025

Sarvam AIOpen weights

Sarvam's medium-scale open model for multilingual Indian-language chat, reasoning, and translation tasks.

Devstral Small 2505

Dense24B128K ctxMay 21, 2025

Mistral and All Hands AI's open coding-agent model, released as a 24B Apache-2.0 research preview for software engineering tasks. Devstral is optimized for repository navigation, issue resolution, and agentic coding and is available via Hugging Face and Mistral's API.

Gemma 3n E4B

Hybrid8B32K ctxMay 20, 2025

Google's mobile-first Gemma 3n model variant, built with a MatFormer-style architecture for efficient on-device multimodal inference. The E4B variant has roughly 4B effective parameters, supports text, vision, audio, and video-oriented use cases, and is released under Gemma terms.

Mistral Medium 3

Mistral's May 2025 enterprise workhorse model, positioned as a high-performance, lower-cost alternative to larger proprietary systems for coding, STEM, enterprise search, and multilingual workloads. Mistral lists API pricing at $0.40 / $2 per million input/output tokens and offers hosted and enterprise deployment paths.

—Undisc.— ctxMay 7, 2025

Phi-4 Reasoning

Dense14B— ctxApr 30, 2025

Phi-4 reasoning-specialized model family for math, science, and chain-of-thought style tasks.

Granite 3.3 8B

Dense8B128K ctxApr 30, 2025

Granite 3.3 text update for enterprise chat, RAG, and instruction-following workflows.

Qwen3-235B-A22B

MoE235B128K ctxApr 28, 2025

Largest open Qwen3 MoE, introducing hybrid thinking/non-thinking modes and 119-language coverage.

Kimi-Audio-7B-Instruct

Hybrid10B— ctxApr 25, 2025

Open audio foundation model for audio understanding, generation, speech recognition, audio QA, captioning, and speech conversation.

Kimi-VL-A3B-Instruct

MoE16B128K ctxApr 17, 2025

Efficient MIT-licensed vision-language MoE for OCR, image/video understanding, long documents, and OS-style agent tasks.

OpenAI o3

—Undisc.— ctxApr 16, 2025

Reasoning model released alongside o4-mini with tool use, image reasoning, and stronger agentic problem solving.

GPT-4.1

—Undisc.1M ctxApr 14, 2025

API model family focused on coding, instruction following, and one-million-token long-context work.

Llama 4 Maverick

Meta AIFrontierOpen weights

Meta's flagship open-weight MoE; highest MMLU among open models at release.

MoE400B1M ctxApr 5, 2025

Llama 4 Scout

MoE109B10M ctxApr 5, 2025

Efficient open-weight MoE designed for very long context on modest hardware.

Llama 4 Behemoth

Announced

Meta AIFrontierOpen weights

Meta's announced but unreleased Llama 4 teacher model: a multimodal MoE with 288B active parameters and nearly 2T total parameters. Meta says it was still training when Scout and Maverick shipped and that those released models were distilled from Behemoth.

MoE2T— ctxApr 5, 2025

Llama-3.3-Nemotron-Super-49B

Dense49B128K ctxApr 2, 2025

Open Llama Nemotron reasoning model from NVIDIA's 2025 Nemotron family.

Qwen2.5-Omni-7B

Local omni-modal Qwen model that supports text, image, audio, video, and speech generation in a 7B package.

Dense7B— ctxMar 26, 2025

DeepSeek-V3-0324

MoE671B128K ctxMar 25, 2025

Post-R1 V3 update with improved reasoning, front-end coding, Chinese writing, search, and function calling.

Gemini 2.5 Pro

—Undisc.1M ctxMar 25, 2025

Reasoning-focused Gemini 2.5 model that made thinking a core part of Google's flagship model line.

Mistral Small 3.1

Dense24B128K ctxMar 17, 2025

Apache-licensed Small update adding vision and a 128K context window to the efficient 24B line.

ERNIE X1

—Undisc.— ctxMar 16, 2025

Baidu's reasoning model released alongside ERNIE 4.5 before the open ERNIE 4.5 weights.

OLMo 2 32B

Dense32B4K ctxMar 13, 2025

A fully open model — weights, data, and training code all public — and the first such to beat GPT-3.5 / GPT-4o mini.

Command A

Dense111B256K ctxMar 13, 2025

CohereOpen weights

Enterprise-grade model tuned for RAG, tool use, and multilingual business workloads.

Granite 3.2 8B

Dense8B128K ctxFeb 26, 2025

Granite 3.2 update with reasoning controls and multimodal/document-oriented Granite variants.

Claude 3.7 Sonnet

—Undisc.200K ctxFeb 24, 2025

Anthropic's first hybrid-reasoning Sonnet. Shut down May 11, 2026 as the 4.x line matured.

Moonlight-16B-A3B-Instruct

MIT-licensed 16B/3B-active MoE trained with Moonshot's scalable Muon optimizer experiments.

MoE16B8K ctxFeb 24, 2025

DeepHermes 3 Llama 3 8B

Nous ResearchOpen weights

Nous reasoning-oriented Hermes model trained to combine concise answers with optional deep reasoning traces.

Dense8B8K ctxFeb 18, 2025

Grok 3

—Undisc.— ctxFeb 17, 2025

xAI's third-generation model family, introduced with stronger reasoning, search, and coding modes.

Dolphin 3.0 Llama 3.1 8B

Cognitive ComputationsOpen weights

Popular local assistant model tuned for coding, math, function calling, and agentic workflows.

Dense8B128K ctxFeb 2, 2025

Mistral Small 3

Dense24B32K ctxJan 30, 2025

A latency-optimized 24B dense model under Apache-2.0 — a popular local-deployment workhorse.

Qwen2.5-Max

Alibaba (Qwen)Proprietary

Proprietary MoE flagship for the Qwen2.5 generation, released through Qwen Chat and Alibaba Cloud APIs.

MoEUndisc.— ctxJan 29, 2025

Qwen2.5-VL-72B

Dense72B128K ctxJan 26, 2025

Vision-language Qwen2.5 model for image, document, video, and agentic visual grounding tasks.

Doubao-1.5-pro

—Undisc.— ctxJan 22, 2025

Doubao 1.5 Pro update positioned for stronger multimodal, reasoning, and agentic work in Volcano Engine.

DeepSeek-R1

MoE671B128K ctxJan 20, 2025

Breakout open reasoning model trained with large-scale reinforcement learning and released with weights under MIT.

Kimi k1.5

—Undisc.— ctxJan 20, 2025

Moonshot AIProprietary

Moonshot's multimodal reinforcement-learning reasoning model, reported as matching OpenAI o1 on math, coding, and multimodal reasoning.

MiniMax-01

Hybrid456B4M ctxJan 15, 2025

MiniMaxOpen weights

Open MiniMax generation with MiniMax-Text-01 and MiniMax-VL-01 long-context models.

DeepSeek-V3

MoE671B128K ctxDec 26, 2024

The 671B/37B-active MoE release that made DeepSeek a central open-model lab before the R1 breakthrough.

Step-2

—Undisc.— ctxDec 23, 2024

StepFunProprietary

Second-generation StepFun foundation model line with larger-scale multimodal and reasoning ambitions.

Granite 3.1 8B

Dense8B128K ctxDec 18, 2024

IBM's enterprise-focused open model with a 128k context, Apache-2.0 licensed.

Falcon 3 10B

Dense10B32K ctxDec 17, 2024

UAE's TII open model designed to run on light infrastructure, including laptops.

Command R7B

Dense8B128K ctxDec 13, 2024

CohereOpen weights

Cohere's smallest, fastest R-series model, tuned for RAG and tool use on modest hardware.

Phi-4

Dense14B16K ctxDec 12, 2024

MicrosoftOpen source

A 14B dense model that rivals far larger ones on math and reasoning, under a permissive MIT license.

Gemini 2.0 Flash

—Undisc.1M ctxDec 11, 2024

First Gemini 2.0 release, built for native multimodal input/output, tool use, and agentic product integrations.

EXAONE 3.5 32B

LG AI ResearchOpen weights

EXAONE 3.5 32B open-weight model for bilingual reasoning, coding, and long-context tasks.

Dense32B32K ctxDec 9, 2024

Llama 3.3 70B

Dense70B128K ctxDec 6, 2024

Late-2024 70B Llama update delivering much of the 405B instruction-following quality at lower serving cost.

OpenAI o1

General release of OpenAI's o1 reasoning model with stronger deliberative reasoning and multimodal ChatGPT integration.

—Undisc.— ctxDec 5, 2024

Amazon Nova Pro

—Undisc.300K ctxDec 3, 2024

AWS-native multimodal model with a 300k context; size and architecture undisclosed.

Amazon Nova Lite

—Undisc.300K ctxDec 3, 2024

Lower-cost multimodal Nova understanding model for text, image, and video inputs.

QwQ-32B-Preview

Dense32B32K ctxNov 28, 2024

Qwen's first public reasoning-preview model, aimed at math, coding, and deliberate problem solving.

Tulu 3 405B

Allen Institute for AI (Ai2)Open weights

Ai2's post-trained open instruction model line, scaling the Tulu recipe to Llama 3.1 405B.

Dense405B128K ctxNov 21, 2024

DeepSeek-R1-Lite-Preview

—Undisc.— ctxNov 20, 2024

DeepSeekProprietary

Reasoning-preview model exposed in DeepSeek Chat ahead of the open DeepSeek-R1 release.

Qwen2.5-Coder-32B

Dense32B128K ctxNov 12, 2024

Code-specialized Qwen2.5 model family, with the 32B checkpoint as the flagship open coding model.

Hunyuan-Large

Tencent HunyuanOpen weights

Tencent's 389B total / 52B active open-weight Transformer MoE, released with a 256K pretraining context and 128K instruct context.

MoE389B128K ctxNov 4, 2024

SmolLM2 1.7B

Dense1.7B— ctxNov 4, 2024

Hugging FaceOpen source

Compact on-device model family trained on 11T tokens, popular for lightweight local chat and experimentation.

Claude 3.5 Haiku

—Undisc.200K ctxOct 22, 2024

Fast, lower-cost Claude 3.5 model for latency-sensitive coding, tool-use, and customer-facing workloads.

Sarvam-1

Sarvam AIOpen weights

Sarvam's 2B open model trained for ten major Indian languages.

Dense2B— ctxOct 22, 2024

Granite 3.0 8B

Dense8B4K ctxOct 21, 2024

Apache-licensed Granite 3.0 text model, part of IBM's push toward enterprise-friendly open models.

Yi-Lightning

MoEUndisc.— ctxOct 16, 2024

01.AIProprietary

01.AI's MoE API model that reached the global top-10 on Chatbot Arena, strong in Chinese, math, and coding.

Ministral 8B

Dense8B128K ctxOct 16, 2024

Small Mistral model line optimized for edge and low-latency workloads.

Llama-3.1-Nemotron-70B

Dense70B128K ctxOct 15, 2024

NVIDIA-tuned Llama 3.1 70B instruction model optimized with Nemotron reward and alignment recipes.

Llama 3.2 90B Vision

Dense90B128K ctxSep 25, 2024

First Llama family release with native vision models, alongside smaller edge-oriented 1B and 3B text models.

Molmo 72B

Allen Institute for AI (Ai2)Open weights

Open multimodal model family trained for strong image understanding, pointing, and visual grounding.

Dense72B— ctxSep 25, 2024

Qwen2.5-72B

Dense72B128K ctxSep 19, 2024

Broad Qwen2.5 foundation-model update spanning general, coding, math, and multimodal descendants.

Pixtral 12B

Dense12B128K ctxSep 17, 2024

Mistral's first open multimodal model, adding image understanding to a Mistral text backbone.

OpenAI o1-preview

—Undisc.— ctxSep 12, 2024

OpenAI's first public reasoning-model preview, optimized to spend more inference time on hard math, coding, and science tasks.

Yi-Coder-9B

Dense9B128K ctxSep 5, 2024

01.AI's compact code model trained for repository-scale programming and code completion tasks.

DeepSeek-V2.5

MoE236B128K ctxSep 5, 2024

Unified DeepSeek V2 generation combining general-chat and coding strengths before the V3 series.

Hunyuan Turbo

Tencent HunyuanProprietary

Tencent's faster, lower-cost Hunyuan update before the open Hunyuan-Large model card.

—Undisc.— ctxSep 5, 2024

OLMoE 1B-7B

Fully open sparse MoE model with 7B total and about 1B active parameters.

MoE7B— ctxSep 3, 2024

Jamba 1.5 Large

Hybrid398B256K ctxAug 22, 2024

AI21 LabsOpen weights

Israel's AI21 hybrid Mamba-Transformer MoE, with a 256k context and strong long-document throughput.

Phi-3.5 MoE

MoE42B128K ctxAug 20, 2024

Phi-3.5 mixture-of-experts model, scaling Microsoft's small-model line while preserving efficient active parameters.

Hermes 3 Llama 3.1 405B

Nous ResearchOpen weights

Large Hermes 3 instruction-tuned model built on Meta's Llama 3.1 405B.

Dense405B128K ctxAug 15, 2024

Grok-2

—Undisc.— ctxAug 13, 2024

Second-generation Grok release with Grok-2 and Grok-2 mini for chat, coding, reasoning, and image-enabled product experiences.

EXAONE 3.0 7.8B

LG AI ResearchOpen weights

LG's first open-weight EXAONE model, a compact bilingual instruction model for Korean and English.

Dense7.8B— ctxAug 7, 2024

MiniCPM-V 2.6

OpenBMBOpen weights

8B vision-language model for local image, multi-image, OCR, and video understanding, with llama.cpp and Ollama support.

Dense8B— ctxAug 2, 2024

Llama 3.1 405B

Dense405B128K ctxJul 23, 2024

Meta's first frontier-scale open Llama model, with 405B parameters, 128K context, multilingual support, and tool-use improvements.

Mistral NeMo

Dense12B128K ctxJul 18, 2024

Apache-licensed 12B model co-developed with NVIDIA, including a 128K context window and strong multilingual tokenization.

Gemma 2 27B

Dense27B8K ctxJun 27, 2024

Second-generation Gemma model, improving open-weight quality and efficiency at 9B and 27B sizes.

Claude 3.5 Sonnet

—Undisc.200K ctxJun 20, 2024

Major Sonnet upgrade that became Anthropic's default high-intelligence workhorse for coding, writing, and visual reasoning.

DeepSeek-Coder-V2

MoE236B128K ctxJun 17, 2024

Open code-focused MoE built from DeepSeek-V2, expanding programming-language coverage and coding benchmark performance.

Nemotron-4 340B

Dense340B4K ctxJun 14, 2024

NVIDIA's large open model family for synthetic data generation and reward modeling.

Qwen2-72B

Dense72B128K ctxJun 7, 2024

Qwen2's largest dense model, introducing stronger multilingual support, coding/math gains, and long-context variants.

GLM-4-9B

Z.ai (Zhipu AI)Open weights

Open GLM-4 9B model family, covering chat, long-context, and code-oriented variants.

Dense9B128K ctxJun 5, 2024

Codestral 22B

Dense22B32K ctxMay 29, 2024

Mistral's first code-specialized model, trained for code generation, fill-in-the-middle, and multi-language programming tasks.

Aya 23 35B

Dense35B— ctxMay 23, 2024

CohereOpen weights

Open multilingual research model covering 23 languages, released by Cohere For AI.

Doubao-pro

—Undisc.— ctxMay 15, 2024

ByteDance's commercial Doubao foundation model line for text, code, and assistant workloads.

GPT-4o

—Undisc.128K ctxMay 13, 2024

The 2024 omni-modal model that defined a generation of assistants. Deprecated in Feb 2026 and fully retired across ChatGPT on April 3, 2026.

Yi-1.5-34B

Dense34B4K ctxMay 13, 2024

Yi 1.5 update with stronger instruction following, coding, math, and multilingual performance.

Falcon 2 11B

Dense11B8K ctxMay 13, 2024

Falcon 2 generation, including text and vision-language 11B models under a permissive TII license.

DeepSeek-V2

MoE236B128K ctxMay 7, 2024

DeepSeek's first major MoE general model with Multi-head Latent Attention and low-cost API positioning.

Granite Code 34B

Dense34B8K ctxMay 6, 2024

Apache-2.0 code model from IBM's Granite Code family, used for local code generation and enterprise coding assistants.

Amazon Titan Text Premier

—Undisc.— ctxApr 30, 2024

Larger Titan text model for enterprise RAG, summarization, and agent workflows in Amazon Bedrock.

Snowflake Arctic

Snowflake AI ResearchOpen source

Apache-2.0 enterprise LLM with 480B total / 17B active parameters, optimized for SQL, code, and instruction following.

MoE480B— ctxApr 24, 2024

Phi-3 Mini

Dense3.8B128K ctxApr 23, 2024

3.8B-parameter Phi-3 model released as a phone-capable small model with 4K and 128K variants.

Llama 3 70B

Dense70B8K ctxApr 18, 2024

First Llama 3 release, with 8B and 70B open models and a stronger tokenizer, data mix, and post-training stack.

Mixtral 8x22B

MoE141B64K ctxApr 17, 2024

Larger open Mixtral sparse MoE with 141B total and 39B active parameters, released under Apache-2.0.

abab6.5

—Undisc.1M ctxApr 17, 2024

MiniMaxProprietary

MiniMax's commercial long-context abab model generation before the open MiniMax-01 and M series.

WizardLM-2 8x22B

MoE141B66K ctxApr 15, 2024

Microsoft's WizardLM-2 MoE chat model, widely mirrored and run locally after its model-card release.

Step-1V

—Undisc.— ctxApr 12, 2024

StepFunProprietary

StepFun's first major vision-language model, released after the Step-1 language model.

CodeGemma 7B

Open code-specialized Gemma model for local code completion, generation, and instruction-following.

Dense7B8K ctxApr 9, 2024

Command R+

—Undisc.128K ctxApr 4, 2024

CohereProprietary

Higher-capability RAG and tool-use model in Cohere's Command R family.

Grok-1.5

—Undisc.128K ctxMar 28, 2024

Grok update with stronger reasoning and a 128K context window.

Jamba

Hybrid52B256K ctxMar 28, 2024

AI21 LabsOpen weights

First Jamba hybrid Transformer-Mamba MoE model with open weights and a 256K context length.

DBRX Instruct

Databricks / MosaicMLOpen weights

Databricks' 132B-total / 36B-active open MoE model for code, math, RAG, and enterprise self-hosted workloads.

MoE132B32K ctxMar 27, 2024

Step-1

—Undisc.— ctxMar 23, 2024

StepFunProprietary

StepFun's first public foundation model generation, introduced as a trillion-parameter Chinese model line.

Kimi 1M

—Undisc.— ctxMar 18, 2024

Moonshot AIProprietary

Long-context Kimi upgrade advertised with support for million-character document and conversation contexts.

Command R

—Undisc.128K ctxMar 11, 2024

CohereProprietary

Enterprise RAG-focused model with tool use, citations, multilingual retrieval, and long-context support.

Claude 3 Opus

—Undisc.200K ctxMar 4, 2024

Highest-capability Claude 3 model, launched with Sonnet and Haiku and Anthropic's first major vision-capable Claude family.

StarCoder2 15B

Dense16B16K ctxFeb 28, 2024

BigCodeOpen weights

Next-generation BigCode code model trained on 4T+ tokens and 600+ programming languages, with 16K context.

Mistral Large

—Undisc.32K ctxFeb 26, 2024

Mistral's first proprietary flagship API model, introduced alongside Le Chat and stronger multilingual/coding performance.

Gemma 7B

Dense7B8K ctxFeb 21, 2024

First Gemma open-weight text model family, derived from the same research lineage as Gemini.

Gemini 1.5 Pro

MoEUndisc.2M ctxFeb 15, 2024

Gemini generation that introduced production-scale long context, eventually expanding to a two-million-token window.

Qwen1.5-110B

Dense110B32K ctxFeb 5, 2024

Largest Qwen1.5 model, released as the bridge from the original Qwen line to Qwen2.

Qwen1.5-72B-Chat

Dense72B33K ctxFeb 4, 2024

Largest chat-tuned Qwen1.5 dense checkpoint, released with stronger human-preference alignment, multilingual support, and 32K context.

OLMo 7B

Ai2's first fully open language model release, including weights, training data, code, logs, and intermediate checkpoints.

Dense7B4K ctxFeb 1, 2024

Stable LM 2 1.6B

Dense1.6B— ctxJan 19, 2024

Stability AIOpen weights

Small multilingual Stable LM release built for low hardware barriers and local experimentation.

GLM-4

Z.ai (Zhipu AI)Proprietary

Zhipu's GLM-4 flagship generation, launched as the successor to ChatGLM3 with stronger tool use and multimodal variants.

—Undisc.128K ctxJan 16, 2024

DeepSeekMoE 16B

Early DeepSeek sparse MoE research model that foreshadowed the later V2/V3 architecture direction.

MoE16B4K ctxJan 11, 2024

Nous Hermes 2 Mixtral

MoE47B32K ctxJan 11, 2024

Nous ResearchOpen source

Nous instruction-tuned Mixtral model with strong open-chat and tool-use adoption.

OpenChat 3.5

OpenChatOpen source

Compact Mistral-based local chat model trained with C-RLFT, popular in early 2024 local leaderboards.

Dense7B— ctxJan 6, 2024

TinyLlama 1.1B Chat

Dense1.1B— ctxJan 1, 2024

TinyLlamaOpen source

Compact Llama-style 1.1B chat model trained for local experimentation and low-memory deployments.

Phi-2

Dense2.7B— ctxDec 12, 2023

2.7B-parameter Phi model showing strong reasoning and language understanding at small scale.

OpenHathi-7B

Sarvam AIOpen weights

Sarvam AI's first open Indic language model, adapted from Llama 2 for Hindi and Indian-language work.

Dense7B— ctxDec 12, 2023

Mixtral 8x7B

MoE47B32K ctxDec 11, 2023

The open sparse Mixture-of-Experts that brought MoE efficiency to the open ecosystem.

Gemini 1.0 Ultra

—Undisc.32K ctxDec 6, 2023

Google's first natively multimodal Gemini flagship, since superseded by the 1.5/2/3 lines.

Qwen-72B

Dense72B32K ctxNov 30, 2023

Alibaba's first major open Qwen model and the start of a prolific open-weight line.

DeepSeek LLM 67B

Dense67B4K ctxNov 29, 2023

First general DeepSeek language model family, with 7B and 67B base/chat checkpoints.

Yi-34B-Chat

Dense34B4K ctxNov 23, 2023

Chat-tuned Yi-34B checkpoint from 01.AI, released alongside quantized chat variants for bilingual open-weight assistants.

Claude 2.1

—Undisc.200K ctxNov 21, 2023

Claude update with a 200K context window, lower hallucination rates, and improved tool-use beta support.

Yi-34B

Dense34B200K ctxNov 6, 2023

01.AI's strong bilingual open model, with a 200k-context variant.

GPT-4 Turbo

—Undisc.128K ctxNov 6, 2023

Lower-cost GPT-4 generation with a 128K context window, introduced at OpenAI DevDay.

Grok-1

xAIOpen source

xAI's first Grok model, later released as open weights with a 314B-parameter MoE checkpoint.

MoE314B— ctxNov 4, 2023

DeepSeek Coder 33B

Dense33B16K ctxNov 2, 2023

DeepSeek's first public code-model family, released before the general DeepSeek LLM line.

ERNIE 4.0

—Undisc.— ctxOct 17, 2023

Baidu's fourth-generation ERNIE flagship, announced with stronger understanding, generation, reasoning, and memory.

Kimi Chat

Moonshot AIProprietary

Moonshot's first Kimi assistant release, establishing the long-context product line before the open Kimi model cards.

—Undisc.— ctxOct 9, 2023

LLaVA 1.5 13B

Hybrid13B— ctxSep 30, 2023

LLaVAOpen weights

Open vision-language assistant and one of the most widely run early local multimodal models.

Amazon Titan Text Express

—Undisc.— ctxSep 28, 2023

Amazon's first-party Titan text generation model exposed through Bedrock, initially alongside embeddings and image models.

Mistral 7B

Dense7B8K ctxSep 27, 2023

The 7B that punched far above its weight and put Mistral on the map.

Qwen-14B

Dense14B8K ctxSep 25, 2023

Second open Qwen size, expanding the first-generation Qwen language-model lineup.

Granite 13B

IBMOpen weights

IBM's early Granite foundation model family for enterprise language and code tasks.

Dense13B— ctxSep 7, 2023

Hunyuan

Tencent HunyuanProprietary

Tencent's first Hunyuan foundation model release, introduced as a general-purpose Chinese enterprise model.

—Undisc.— ctxSep 7, 2023

Falcon 180B

Dense180B2K ctxSep 6, 2023

At launch the largest openly available model, from the UAE's TII.

Code Llama 34B

Dense34B16K ctxAug 24, 2023

Meta's first code-specialized Llama model family, released in base, Python, and instruction-tuned variants.

Qwen-7B

Dense7B32K ctxAug 3, 2023

Alibaba's first open Qwen checkpoint and the start of the Qwen open-model line.

Nous-Hermes-Llama2-13B

Nous ResearchOpen weights

Early Nous Hermes instruction model on Llama 2, widely used in the open-model fine-tuning ecosystem.

Dense13B4K ctxJul 24, 2023

EXAONE 2.0

LG AI ResearchProprietary

Second EXAONE generation, improving bilingual Korean-English performance and enterprise deployment options.

—Undisc.— ctxJul 19, 2023

Llama 2 70B

Dense70B4K ctxJul 18, 2023

The release that made capable open-weight models genuinely usable for production.

Claude 2

—Undisc.100K ctxJul 11, 2023

Anthropic's first widely-available Claude, notable for an early 100k-token context window.

ChatGLM2-6B

Z.ai (Zhipu AI)Open weights

Second open ChatGLM generation, improving long context, inference efficiency, and bilingual chat quality.

Dense6B32K ctxJun 25, 2023

Phi-1

Dense1.3B— ctxJun 21, 2023

Microsoft's first Phi small-language-model release, demonstrating strong code performance from textbook-quality synthetic data.

Falcon 40B

Dense40B2K ctxMay 25, 2023

TII's breakout open Falcon model, released before Falcon 180B and trained on the RefinedWeb corpus.

PaLM 2

DenseUndisc.— ctxMay 10, 2023

Google's improved multilingual, reasoning, and coding foundation model family introduced at I/O 2023.

MPT-7B

Databricks / MosaicMLOpen source

MosaicML's permissively licensed 7B model, an early favorite for commercial local fine-tuning and long-context variants.

Dense7B2K ctxMay 5, 2023

Vicuna 13B

LMSYS / SkyLabOpen weights

LMSYS instruction-tuned LLaMA model that became a landmark early local ChatGPT-style assistant.

Dense13B— ctxMar 30, 2023

ERNIE Bot

—Undisc.— ctxMar 16, 2023

Baidu's public chat assistant launch, built on the ERNIE foundation-model line.

GPT-4

—Undisc.8K ctxMar 14, 2023

The model that brought reliable multi-step reasoning to the mainstream; size never disclosed.

ChatGLM-6B

Z.ai (Zhipu AI)Open weights

Zhipu AI and Tsinghua KEG's first widely used open bilingual ChatGLM checkpoint.

Dense6B2K ctxMar 14, 2023

Claude 1

—Undisc.— ctxMar 14, 2023

Anthropic's first broadly announced Claude assistant model, launched through an API and select product partners.

Jurassic-2 Ultra

AI21 LabsProprietary

Second-generation Jurassic model with better multilingual support, lower latency, and instruction following.

—Undisc.— ctxMar 9, 2023

GPT-3.5 Turbo

—Undisc.4K ctxMar 1, 2023

OpenAI's first ChatGPT API model, bringing the GPT-3.5 chat-tuned line to developers at much lower cost than text-davinci-003.

LLaMA

Dense65B2K ctxFeb 24, 2023

Meta's first LLaMA, released to researchers; its leak catalyzed the open-weight movement.

Galactica

Withdrawn

Dense120B2K ctxNov 15, 2022

A science-focused model whose public demo was withdrawn after just three days over confidently wrong outputs — an early, instructive retraction.

BLOOM

Dense176B2K ctxJul 12, 2022

BigScienceOpen weights

An open, multilingual 176B model (46 languages) from a global research collaboration.

PaLM

Dense540B— ctxApr 4, 2022

Google's 540B Pathways model; the API was later deprecated in favor of Gemini.

EXAONE 1.0

LG AI ResearchProprietary

LG AI Research's first EXAONE foundation model generation, introduced as a large multimodal expert AI.

—Undisc.— ctxDec 14, 2021

ERNIE 3.0 Titan

Dense260B— ctxDec 8, 2021

Baidu's 260B-parameter ERNIE 3.0 Titan model, an early Chinese frontier-scale language model.

Jurassic-1 Jumbo

Dense178B— ctxAug 11, 2021

AI21 LabsProprietary

AI21's first major API language model, launched through AI21 Studio.

GPT-3

Dense175B2K ctxJun 11, 2020

The 175B model that proved in-context learning at scale; its base API models were retired in 2024.

GPT-2

Dense1.5B1K ctxNov 5, 2019

OpenAIOpen source

Initially withheld over misuse fears, then fully released in Nov 2019 — an early 'limited release' debate.

BERT