Methodology

How we track releases

The goal is a record you can trust: sourced, versioned, and corrected in the open. Detection is automated; publication is not.

01
Detection
A scheduled crawler watches a registry of primary sources — official lab blogs, model cards, API changelogs, release notes, and a handful of reputable leaderboards. Each source is fingerprinted so we only act on genuine changes, not re-renders.
02
Extraction
When a source changes, the new content is passed to a language model with a strict schema. It returns a structured candidate — model name, lab, license, size, context window, dates, and any benchmark claims — along with the exact text it drew each field from.
03
Automated validation
Every candidate runs a battery of checks before a human sees it: is the source an official domain? Is the date plausible? Does the model already exist (update vs. new entry)? Do benchmark claims cite an eval? Does the license resolve to a known family? Each check carries a severity and feeds a confidence score.
04
Human review
Nothing publishes automatically. Candidates land in a review queue, sorted by confidence and surfaced with their evidence and validation flags. A human approves, edits, requests more info, or rejects. Approval merges the candidate into the catalog, attributed to the reviewer.
05
Change logging
Every field-level change — including silent edits and lifecycle transitions like deprecation or withdrawal — is written to an append-only audit log. That is what powers the per-model timeline and the site-wide changelog. Nothing is overwritten without a trace.

On benchmark numbers

Most figures vendors publish are self-reported, run under conditions they choose. We record them as claims and label them self-reported until an independent evaluation confirms the result, at which point it becomes verified. We never silently merge a self-reported number into a verified one.

Corrections

If something is wrong or missing, tell us. Because every change is logged, corrections are fast and auditable. See the about page for how to reach us and how to use the public data API.

Detection

Extraction

Automated validation

Human review

Change logging

On benchmark numbers

Corrections